Standard ML: Datatype vs. Structure - structure

I'm reading through Paulson's ML For the Working Programmer and am a bit confused about the distinction between datatypes and structures.
On p. 142, he defines a type for binary trees as follows:
datatype 'a tree = Lf
| Br of 'a * 'a tree * 'a tree;
This seems to be a recursive definition where 'a denotes some fixed type. So any time I see 'a, it must refer to the same type throughout.
On p. 148, he discusses a structure for binary trees:
"...we have been following an imaginary ML session in which we typed in the tree functions one at a time. Now we ought to collect the most important of those functions into a structure, called Tree. We really must do so, because one of our functions (size) clashes with a built-in function. One reason for using structures is to prevent such name clashes.
We shall, however, leave the datatype declaration of tree outside of the structure. If it were inside, we should be forced to refer to the constructors by Tree.Lf and Tree.Br, which would make our patters unreadable. Thus, in the sequel, imagine that we have made the following declarations:
datatype 'a tree = Lf
| Br of 'a * 'a tree * 'a tree;
structure Tree =
struct
fun size Lf = 0
| size (Br( v, t1, t2)) = 1 + size t1 + size t2;
fun depth...
etc...
end;
I'm a little confused.
1) What is the relationship between a datatype and a structure?
2) What is the role of "struct" within the structure definition?
3) Later on, Paulson discusses a structure for dictionaries as binary search trees. He does the following:
structure Dict : DICTIONARY =
struct
type key = string;
type 'a t = (key * 'a) tree;
val empty = Lf;
<a bunch of functions for dictionaries>
This makes me think struct specifies the different primitive or compound types involved int he definition of a Dict.
That's a really fuzzy definition though. Anyone like to clarify?
Thanks for the help,
bclayman

A structure is a module. Everything between the struct and end keywords forms the body of this module. Similarly, you can view a signature as the description of an abstract module interface. Ascribing a signature to a structure (like the : DICTIONARY syntax does in your example) limits the exports of the module to what is specified in that signature (by default, everything would be accessible). That allows you to hide implementation details of a module.
However, ML modules are much richer than that. They can be arbitrarily nested. There are also functors, which are effectively functions from modules to modules ("parameterised modules", if you want). Altogether, the module language in ML forms a full functional language on its own, with structures as the basic entities, functors over them, and signatures describing the "types" of such modules. This little language is a layer on top of the so-called core language, where ordinary values and types live.
So, to answer your individual questions:
1) There is no specific relationship between the datatype and the structure. The latter simply uses the former.
2) struct-end is simply a keyword pair to delimit the structure body (languages in C tradition would probably use curly braces there).
3) As explained above, a structure is a basic module. It can contain (and export) arbitrary other language entities, including other modules. By grouping definitions together, and potentially hiding some of them through a signature ascription, you can express namespacing and encapsulation (in particular, abstract data types).
I should also note that Paulson's book is outdated regarding its description of modules, as it predates the current language version. In particular, it does not describe how to express abstract data types through modules, but instead introduces the obsolete abstype declaration which nobody has been using in almost 20 years. A more extensive and up-to-date introduction to modular programming in ML can be found in Harper's Programming in Standard ML.

In this example, the datatype 'a tree is describing a binary tree (https://en.wikipedia.org/wiki/Binary_tree) that is capable of storing any value of a single type. The 'a in the definition is a variant type which will later be constrained down to a concrete type wherever tree is used with a different type. This allows you to define the structure of a tree once and then use it with any type later on.
The Tree structure is separate from the datatype definition. It is being used to group functions together that operate on the 'a tree datatype. It is being used right now as a way to modularize the code and, as it points out, to prevent namespace clashes.
struct is just an identifier keyword to let the compiler know where your structure definition starts while the end keyword is used to let the compiler know where the definition ends.
The dictionary structure is defining a dictionary (a key -> value data structure) that uses a tree as the internal data structure. Once again, the structure is a collection of functions that will be used to create and operate on dictionaries. The types within the dictionary structure compose the type of the internal data structure that makes up the dictionary. The following functions define the public interface that you're exposing to allow clients to work with dictionaries.

Related

What is the non-synchronized equivalent to a protected type in Ada?

Ada has a construct called "protected types", where you have a collection of variables and subprograms associated with a type, and the subprograms have implicit synchronization. These types can be instantiated and each instance will have its own memory where the variables live. This looks a lot like the class/object duality in mainstream OOP languages such as C++ and Java, minus inheritance, plus mandatory implicit synchronization.
Is there an equivalent to this construct, minus the synchronization? If not, what's the rationale behind this design choice?
To be entirely clear, I'm aware that Ada supports different styles of OOP without any kind of synchronization. My question is about the specific style of OOP I mentioned - as it is one of the most common styles found in mainstream languages, and is indeed also present in Ada in some form.
To further clarify the question, which had been intentionally (and misguidedly) left open-ended, I am aware that the answer is "packages". But then, consider the following:
We have packages, which are units containing variables and subprograms, of which several instances can be created
We have types, which are enums or projections/mod of built-it types (I know this is a very approximate definition, specifics don't really matter here)
We have protected types, which are... units containing variables and subprograms, of which several instances can be created. Plus, they have synchronization.
This begs the thought: why "protected types" and not "protected packages"? This thought is the origin of the present question.
OOP is a set of concepts in programming without any dependence in a particular syntax. According to the Ada 95 Rationale: "Type extension in Ada 95 builds upon the existing Ada 83 concept of a derived type. In Ada 83, a derived type inherited the operations of its parent and could add new operations; however, it was not possible to add new components to the type. The whole mechanism was thus somewhat static. By contrast, in Ada 95 a derived type can also be extended to add new components."
In Ada a type is a type, independently of it providing OOP features or not. Ada 95 provided extension on top of other POO features already provided by Ada 83 types. The advantage of that is that you can turn easily a non-tagged type to a tagged type, if you later need type-extension, without affecting current uses of the type. This also avoids introducing hidden features in the OOP syntax, like friend classes (types sharing the package), static members (global package variables), the implicit this, or const at the end of a method to indicate that this object is not modified, etc.
Why protected types do not follow this pattern? They probably follow that of Ada 83 task types, but the latter don't have a private part, so it is still inconsistent. The design probably chose syntax of task types as inspiration, but added private part for efficiency (that was the main concern: "protected types allows a more efficient implementation of standard problems of shared data access").
So this is an answer to the title of this question:
"What is the non-synchronized equivalent to a protected type in Ada?"
I'm adding this mainly for people searching this question looking for the answer to the topic's title question.
Take a simple example protected type:
protected type My_Type is
procedure Set_Value(Value : Integer);
function Get_Value return Integer;
private
The_Value : Integer := 0;
end My_Type;
protected body My_Type is
procedure Set_Value(Value : Integer) is
begin
The_Value := Value;
end Set_Value;
function Get_Value return Integer is
begin
return The_Value;
end Get_Value;
end My_Type;
The equivalent non synchronized version would be to use a record type (or a tagged record if you want type extension) within a package paired with the operations on that type:
package My_Types is
-- For type extension use:
-- type My_Type is tagged private;
type My_Type is private;
procedure Set_Value(Self : in out My_Type; Value : Integer);
function Get_Value(Self : My_Type) return Integer;
private
-- For type extension use:
-- type My_Type is tagged record
type My_Type is record
The_Value : Integer := 0;
end record;
end My_Types;
package body My_Types is
procedure Set_Value(Self : in out My_Type; Value : Integer) is
begin
Self.The_Value := Value;
end Set_Value;
function Get_Value(Self : My_Type) return Integer is
begin
return Self.The_Value;
end Get_Value;
-- Alternate syntax:
-- function Get_Value(Self : My_Type) return Integer is (Self.The_Value);
end My_Types;
Operations declared within a package that operate on a type in the package before that type is frozen are "associated" to that type (Ada calls them primitive operations). This includes functions that return those types.
For the "why" each layout was chosen differently, I don't really know. It might be helpful to take a look at the bottom of the following page and look through all the comments/emails/discussions of the ARG (credit to Simon Wright for the initial link):
http://archive.adaic.com/standards/ada95.html
If I had to guess without fully reading those sections Simon pointed me to (I will get to reading them all the way through), I would wager it has to do with the fact that records existed as is before protected types existed and protected types were thought more of an extension of the tasking model, so they iterated on the task type layout for protected type. Some of what I did read(here and here) already led me to believe they ran into some existing issues (either technical or philosophical) trying to layout protected types more like records.
Note that protected types do not give the full set of "information hiding" capabilities as most programmers expect, such as public vs private member variables (only private for protected types).
Credit to Simon Wright for the links I provided
The standard way to define a complete type (data + operations) is Ada is with a package containing the type declaration (often private) and the subprograms for the type.
In general, encapsulation and information hiding (package) are orthogonal to types and subprograms in Ada. In many commonly used languages, encapsulation and information hiding are provided only by the class construct.
This is a bit of a ramble round the topic ...
If you had a protected package, what parts of its contents would be synchronised? Any variable, spec or body? any type? child packages? And, to be able to create multiple instances of the package, it’d have to be generic. How then could you create an instance within a record? I think it needs to be a type.
As I understand it, there’s not really a parallel to package in C++, so you’d have to say protected class Foo ... which seems hard to distinguish from a protected type.
Given packages, which already encapsulate everything else, I guess the design team could have gone with something like
type P is record
...
end record;
pragma Protected (P);
where primitive operations of P would be synchronised, but you then have the problem of clarity (primitiveness being easy to get wrong) and of visibility (you really don’t want any of the components of P being accessible from outside). What syntax do we use for entry operations? Protected types seem a reasonable compromise.
Is there an equivalent to this construct, minus the synchronization?
If not, what's the rationale behind this design choice?
Ok, the other answers are really quite good, but here's the simple answer:
Ada defines a “type” as a set of values and a set of operations on those values; the notion of “subtype” is likewise defined as a type with an additional (possibly null) set of constraints on its values. — This leads to the ability to say “Subtype Natural is Integer 0..Integer'Last;” — In Ada 83 there was no way to add values to a type, but there was type-derivation where you could 'inherit' a type, possibly adding other operations and/or altering representational items. (Thus you could have “Type Native_Data is array(1..10, 1..200) of Integer;” and “Type External_Data is new Native_Data;” with "For External_Data'Convention use Fortran;"1 and convert between native and external formats via conversion: Data:= Native_Data( From_Disk(File => "Import.dat") ).)
So, Ada95 built atop type-derivation allowing more values which are the type-extension (as well as the more operations). — Ada95 also extended the library/compilation-units structure from a 'flat' notion to a hierarchical one, but the basic unit of organization was (and is still) the package.
Now, we get to protected types, protected types are synchronization types, the data encapsulated into the construct and manipulated via accessors and mutators — this construct is pretty much the bastard child of packages and tasks: it is structured reminiscent of the package and has the queue-like access (entries, functions. procedures) of tasks, albeit a bit more 'exposed'/explicit than the implicit nature of task entries and the rendezvous.
So then, what is a protected type without synchronization?
Simple, a regular type.
This begs the thought: why "protected types" and not "protected
packages"?
While I'm sure that the above provides enough information for you to suss things out, the simple answer is this:
Packages are really interfaces (in the general notion, not the keyword/tagged-type notion) and namespaces: they declare the public view and also segregate the private implementation, as well as encapsulating the scope of the things within.
Thus a “Protected Package” would essentially be the protected type "but with namespacing" — and thus be a really redundant construct, not to mention that one of the motivating factors for protected types was the ability to drop the active thread of control required from tasks for synchronization: all that can be handled by the compiler inserting the proper queuing/bookkeeping around accesses without any of the complexity (and timing/scheduling impositions) that a task would require — so there would have to be special rules for a "protected package" either disallowing Task or requiring some special form, which would add complexity to the compiler.
1 — Fortran uses column-major ordering for its multidimensional arrays, Ada uses row-major ordering [I don't recall if this is required by the LRM]; this 'trick' allows you to have the compiler handle the "trans-positioning", as well as using the type-system to keep track of which is which. (You can use this with things like network-format vs native-format in protocols, too.)

Restructuring an OOP datatype into Haskell types

Coming from an OOP background, Haskell's type system and the way data constructors and typeclasses interact is difficult to conceptualize. I can understand how each are used for simple examples, but some more complication examples of data structures that are very well-suited for an OOP style are proving non-trivial to translate into similarly elegant and understandable types.
In particular, I have a problem with organizing a hierarchy of data such as the following.
This is a deeply nested hierarchical inheritance structure, and the lack of support for subtyping makes it unclear how to turn this structure into a natural-feeling alternative in Haskell. It may be fine to replace something like Polygon with a sum data type, declaring it like
data Polygon
= Quad Point Point
| Triangle Point Point Point
| RegularNGon Int Radius
| ...
But this loses some of the structure, and can only really satisfactorily be done for one level of the hierarchy. Typeclasses can be used to implement a form of inheritance and substructure in that a Polygon typeclass could be a subclass of a Shape, and so maybe all Polygon instances have implementations for centroid :: Point and also vertices :: [Point], but this seems unsatisfactory. What would be a good way of capturing the structure of the picture in Haskell?
You can use sum types to represent the entire hierarchy, without losing structure. Something like this would do it:
data Shape = IsPoint Point
| IsLine Line
| IsPolygon Polygon
data Point = Point { x :: Int, y :: Int }
data Line = Line { a :: Point, b :: Point }
data Polygon = IsTriangle Triangle
| IsQuad Quad
| ...
And so on. The basic pattern is you translate each OO abstract class into a Haskell sum type, with each of its immediate OO subclasses (that may themselves be abstract) as variants in the sum type. The concrete classes are product/record types with the actual data members in them.1
The thing you lose compared to the OOP you're used to by modeling things this way isn't the ability to represent your hierarchy, but the ability to extend it without touching existing code. Sum types are "closed", where OO inheritance is "open". If you later decide that you want a Circle option for Shape, you have to add it to Shape and then add cases for it everywhere you pattern match on a Shape.
However, this kind of hierarchy probably requires fairly liberal downcasting in OO. For example, if you want a function that can tell if two shapes intersect that's probably an abstract method on Shape like Shape.intersects(Shape other), so each sub-type gets to write its own implementation. But when I'm writing Rectangle.intersects(Shape other) it's basically impossible generically, without knowing what other subclasses of Shape are out there. I'll have to be using isinstance checks to see what other actually is. But that actually means that I probably can't just add my new Circle subclass without revisiting existing code; an OO hierarchy where isinstance checks are needed is de-facto just as "closed" as the Haskell sum type hierarchy is. Basically pattern matching on one of the sum-types generated by applying this pattern is the equivalent of isinstancing and downcasting in the OO version. Only because the sum types are exhaustively known to the compiler (only possible because they're closed), if I do add a Circle case to Shape the compiler is able to tell me about all the places that I need to revisit to handle that case.2
If you have a hierarchy that doesn't need a lot of downcasting, it means that the various base classes have substantial and useful interfaces that they guarantee to be available, and you usually use things through that interface rather than switching on what it could possibly be, then you can probably use type classes. You still need all the "leaf" data types (the product types with the actual data fields), only instead of adding sum type wrappers to group them up you add type classes for the common interface. If you can use this style of translation, then you can add new cases more easily (just add the new Circle data type, and an instance to say how it implements the Shape type class; all the places that are polymorphic in any type in the Shape class will now handle Circles as well). But if you're doing that in OO you always have downcasts available as an escape hatch when it turns out you can't handle shapes generically; with this design in Haskell it's impossible.3
But my "real" answer to "how do I represent OO type hierarchies in Haskell" is unfortunately the trite one: I don't. I design differently in Haskell than I do in OO languages4, and in practice it's just not a huge problem. But to say how I'd design this case differently, I'd have to know more about what you're using them for. For example you could do something like represent a shape as a Point -> Bool function (that tells you whether any given point is inside the shape), and having things like circle :: Point -> Int -> (Point -> Bool) for generating such functions corresponding to normal shapes; that representation is awesome for forming composite intersection/union shapes without knowing anything about them (intersect shapeA shapeB = \point -> shapeA point && shapeB point), but terrible for calculating things like areas and circumferences.
1 If you have abstract classes with data members, or you have concrete classes that also have further subclasses you can manually push the data members down into the "leaves", factor out the inherited data members into a shared record and make all of the "leaves" contain one of those, split a layer so that you have a product type containing the inherited data members and a sum type (where that sum type then "splits" into the options for the subclasses), stuff like that.
2 If you use catch-all patterns then the warning might not be exhaustive, so it's not always bullet proof, but how bullet proof it is is up to how you code.
3 Unless you opt into runtime type information with a solution like Typeable, but that's not an invisible change; your callers have to opt into it as well.
4 Actually I probably wouldn't design a hierarchy like this even in OO languages. I find it doesn't turn out to be as useful as you'd think in real programs, hence the "favour composition over inheritance" advice.
You may be looking for a Haskell equivalent of dynamic dispatch, such that you could store a heterogeneous list of values supporting distinct implementations of a common Shape interface.
Haskell's existential types support this kind of usage. It's fairly rare for a Haskell program to actually need existential types -- as Ben's answer demonstrates, sum types can handle this kind of problem. However, existential types are appropriate for a large, open-ended collection of cases:
{-# LANGUAGE ExistentialQuantification #-}
...
class Shape a where
bounds :: a -> AABB
draw :: a -> IO ()
data AnyShape = forall a. Shape a => AnyShape a
This lets you declare instances in an open-ended style:
data Line = Line Point Point
instance Shape Line where ...
data Circle= Circle {center :: Point, radius :: Double}
instance Shape Circle where ...
...
Then, you can build your heterogeneous list:
shapes = [AnyShape(Line a b),
AnyShape(Circle a 3.0),
AnyShape(Circle b 1.8)]
and use it in a uniform way:
drawIn box xs = sequence_ [draw s | AnyShape s <- xs, bounds s `hits` box]
Note that you need to unwrap your AnyShape in order to use the class Shape interface functions. Also note that you must use the class functions to access your heterogeneous data -- there is no other way to "downcast" the unwrapped existential value s! Its type only makes sense within the local scope, so the compiler will not let it escape.
If you are trying to use existential types, yet find yourself needing to "downcast" them, sum types might be a better fit.

Difference between modules and existentials

It's folk knowledge that OCaml modules are "just" existential types. That there's some kind of parity between
module X = struct type t val x : t end
and
data 'a spec = { x : 'a }
data x = X : 'a spec
and this isn't untrue exactly.
But as I just evidenced, OCaml has both modules and existential types. My question is:
How do they differ?
Is there anything which can be implemented in one but not the other?
When would you use one over the other (in particular comparing first-class modules with existential types)?
Completing gsg's answer on your third point.
There are two kinds of way to use modules:
As a structuring construct, when you declare toplevel modules. In that case you are not really manipulating existential variables. When encoding the module system in system-F, you would effectively represent the abstract types by existential variables, but morally, it is closer to a fresh singleton type.
As a value, when using first class modules. In that case you are clearly manipulating existential types.
The other representations of existential types are through GADT's and with objects. (It is also possible to encode existential as the negation of universal with records, but its usage are completely replaced by first class modules).
Choosing between those 3 cases depend a bit in the context.
If you want to provide a lot of functions for your type, you will prefer modules or objects. If only a few, you may find the syntax for modules or objects too heavywheight and prefer GADT. GADT's can also reveal a the structure of your type, for instance:
type _ ty =
| List : ty -> ty list
| Int : int list
type exist = E : 'a ty * 'a -> exist
If you are in that kind of case, you do not need to propagate the function working on that type, so you will end up with something a lot lighter with GADT's existentials. With modules this would look like
module type Exist = sig
type t
val t : t ty
end
module Int_list : Exist = struct
type t = int list
let t = List Int
end
let int_list = (module Int_list:Exist)
And if you need sub-typing or late binding, go for the objects. This can often be encoded with modules but this tend to be tedious.
It's specifically abstract types that have existential type. Modules without abstract types can be explained without existentials, I think.
Modules have features other than abstract types: they act as namespaces, they are structurally typed, they support operations like include and module type of, they allow private types, etc.
A notable difference is that functors allow ranging over types of any (fixed) arity, which is not possible with type variables because OCaml lacks higher kinded types:
module type M = sig
type 'a t
val x : 'a t
end
I'm not quite sure how to answer your last question. Modules and existentials are different enough in practice that the question of when to substitute one for the other hasn't come up.

What is the difference between the concept of 'class' and 'type'?

i know this question has been already asked, but i didnt get it quite right, i would like to know, which is the base one, class or the type. I have few questions, please clear those for me,
Is type the base of a programing data type?
type is hard coded into the language itself. Class is something we can define ourselves?
What is untyped languages, please give some examples
type is not something that fall in to the oop concepts, I mean it is not restricted to oop world
Please clear this for me, thanks.
I didn't work with many languages. Maybe, my questions are correct in terms of : Java, C#, Objective-C
1/ I think type is actually data type in some way people talk about it.
2/ No. Both type and class we can define it. An object of Class A has type A. For example if we define String s = "123"; then s has a type String, belong to class String. But the vice versa is not correct.
For example:
class B {}
class A extends B {}
B b = new A();
then you can say b has type B and belong to both class A and B. But b doesn't have type A.
3/ untyped language is a language that allows you to change the type of the variable, like in javascript.
var s = "123"; // type string
s = 123; // then type integer
4/ I don't know much but I think it is not restricted to oop. It can be procedural programming as well
It may well depend on the language. I treat types and classes as the same thing in OO, only making a distinction between class (the definition of a family of objects) and instance (or object), specific concrete occurrences of a class.
I come originally from a C world where there was no real difference between language-defined types like int and types that you made yourself with typedef or struct.
Likewise, in C++, there's little difference (probably none) between std::string and any class you put together yourself, other than the fact that std::string will almost certainly be bug-free by now. The same isn't always necessary in our own code :-)
I've heard people suggest that types are classes without methods but I don't believe that distinction (again because of my C/C++ background).
There is a fundamental difference in some languages between integral (in the sense of integrated rather than integer) types and class types. Classes can be extended but int and float (examples for C++) cannot.
In OOP languages, a class specifies the definition of an object. In many cases, that object can serve as a type for things like parameter matching in a function.
So, for an example, when you define a function, you specify the type of data that should be passed to the function and the type of data that is returned:
int AddOne(int value) { return value+1; } uses int types for the return value and the parameter being passed in.
In languages that have both, the concepts of type and class/object can almost become interchangeable. However, there are many languages that do not have both. For instance, I believe that standard C has no support for custom-defined objects, but it certainly does still have types. On the otherhand, both PHP and Javascript are examples of languages where type is very loosely defined (basically, types are either single item, collection/array/object, or undefined [js only]), but they have full support for classes/objects.
Another key difference: you can have methods and custom-functions associated with a class/object, but not with a standard data-type.
Hopefully that clarified some. To answer your specific questions:
In some ways, type could be considered a base concept of programming, yes.
Yes, with the exception that classes can be treated as types in functions, as in the example above.
An untyped language is one that lets you use any type of variable interchangeably. Meaning that you can handle a string with the same code that handles an int, for instance. In practice most 'untyped' languages actually implement a concept called duck-typing, so named because they say that 'if it acts like a duck, it should be treated like a duck' and attempt to use any variable as the type that makes sense for the code encountered. Again, php and javascript are two languages which do this.
Very true, type is applicable outside of the OOP world.

How to model class hierarchies in Haskell?

I am a C# developer. Coming from OO side of the world, I start with thinking in terms of interfaces, classes and type hierarchies. Because of lack of OO in Haskell, sometimes I find myself stuck and I cannot think of a way to model certain problems with Haskell.
How to model, in Haskell, real world situations involving class hierarchies such as the one shown here: http://www.braindelay.com/danielbray/endangered-object-oriented-programming/isHierarchy-4.gif
First of all: Standard OO design is not going to work nicely in Haskell. You can fight the language and try to make something similar, but it will be an exercise in frustration. So step one is look for Haskell-style solutions to your problem instead of looking for ways to write an OOP-style solution in Haskell.
But that's easier said than done! Where to even start?
So, let's disassemble the gritty details of what OOP does for us, and think about how those might look in Haskell.
Objects: Roughly speaking, an object is the combination of some data with methods operating on that data. In Haskell, data is normally structured using algebraic data types; methods can be thought of as functions taking the object's data as an initial, implicit argument.
Encapsulation: However, the ability to inspect an object's data is usually limited to its own methods. In Haskell, there are various ways to hide a piece of data, two examples are:
Define the data type in a separate module that doesn't export the type's constructors. Only functions in that module can inspect or create values of that type. This is somewhat comparable to protected or internal members.
Use partial application. Consider the function map with its arguments flipped. If you apply it to a list of Ints, you'll get a function of type (Int -> b) -> [b]. The list you gave it is still "there", in a sense, but nothing else can use it except through the function. This is comparable to private members, and the original function that's being partially applied is comparable to an OOP-style constructor.
"Ad-hoc" polymorphism: Often, in OO programming we only care that something implements a method; when we call it, the specific method called is determined based on the actual type. Haskell provides type classes for compile-time function overloading, which are in many ways more flexible than what's found in OOP languages.
Code reuse: Honestly, my opinion is that code reuse via inheritance was and is a mistake. Mix-ins as found in something like Ruby strike me as a better OO solution. At any rate, in any functional language, the standard approach is to factor out common behavior using higher-order functions, then specialize the general-purpose form. A classic example here are fold functions, which generalize almost all iterative loops, list transformations, and linearly recursive functions.
Interfaces: Depending on how you're using an interface, there are different options:
To decouple implementation: Polymorphic functions with type class constraints are what you want here. For example, the function sort has type (Ord a) => [a] -> [a]; it's completely decoupled from the details of the type you give it other than it must be a list of some type implementing Ord.
Working with multiple types with a shared interface: For this you need either a language extension for existential types, or to keep it simple, use some variation on partial application as above--instead of values and functions you can apply to them, apply the functions ahead of time and work with the results.
Subtyping, a.k.a. the "is-a" relationship: This is where you're mostly out of luck. But--speaking from experience, having been a professional C# developer for years--cases where you really need subtyping aren't terribly common. Instead, think about the above, and what behavior you're trying to capture with the subtyping relationship.
You might also find this blog post helpful; it gives a quick summary of what you'd use in Haskell to solve the same problems that some standard Design Patterns are often used for in OOP.
As a final addendum, as a C# programmer, you might find it interesting to research the connections between it and Haskell. Quite a few people responsible for C# are also Haskell programmers, and some recent additions to C# were heavily influenced by Haskell. Most notable is probably the monadic structure underlying LINQ, with IEnumerable being essentially the list monad.
Let's assume the following operations: Humans can speak, Dogs can bark, and all members of a species can mate with members of the same species if they have opposite gender. I would define this in haskell like this:
data Gender = Male | Female deriving Eq
class Species s where
gender :: s -> Gender
-- Returns true if s1 and s2 can conceive offspring
matable :: Species a => a -> a -> Bool
matable s1 s2 = gender s1 /= gender s2
data Human = Man | Woman
data Canine = Dog | Bitch
instance Species Human where
gender Man = Male
gender Woman = Female
instance Species Canine where
gender Dog = Male
gender Bitch = Female
bark Dog = "woof"
bark Bitch = "wow"
speak Man s = "The man says " ++ s
speak Woman s = "The woman says " ++ s
Now the operation matable has type Species s => s -> s -> Bool, bark has type Canine -> String and speak has type Human -> String -> String.
I don't know whether this helps, but given the rather abstract nature of the question, that's the best I could come up with.
Edit: In response to Daniel's comment:
A simple hierarchy for collections could look like this (ignoring already existing classes like Foldable and Functor):
class Foldable f where
fold :: (a -> b -> a) -> a -> f b -> a
class Foldable m => Collection m where
cmap :: (a -> b) -> m a -> m b
cfilter :: (a -> Bool) -> m a -> m a
class Indexable i where
atIndex :: i a -> Int -> a
instance Foldable [] where
fold = foldl
instance Collection [] where
cmap = map
cfilter = filter
instance Indexable [] where
atIndex = (!!)
sumOfEvenElements :: (Integral a, Collection c) => c a -> a
sumOfEvenElements c = fold (+) 0 (cfilter even c)
Now sumOfEvenElements takes any kind of collection of integrals and returns the sum of all even elements of that collection.
Instead of classes and objects, Haskell uses abstract data types. These are really two compatible views on the problem of organizing ways of constructing and observing information. The best help I know of on this subject is William Cook's essay Object-Oriented Programming Versus Abstract Data Types. He has some very clear explanations to the effect that
In a class-based system, code is organized around different ways of constructing abstractions. Generally each different way of constructing an abstraction is assigned its own class. The methods know how to observe properties of that construction only.
In an ADT-based system (like Haskell), code is organized around different ways of observing abstractions. Generally each different way of observing an abstraction is assigned its own function. The function knows all the ways the abstraction could be constructed, and it knows how to observe a single property, but of any construction.
Cook's paper will show you a nice matrix layout of abstractions and teach you how to organize any class as an ADY or vice versa.
Class hierarchies involve one more element: the reuse of implementations through inheritance. In Haskell, such reuse is achieved through first-class functions instead: a function in a Primate abstraction is a value and an implementation of the Human abstraction can reuse any functions of the Primate abstraction, can wrap them to modify their results, and so on.
There is not an exact fit between design with class hierarchies and design with abstract data types. If you try to transliterate from one to the other, you will wind up with something awkward and not idiomatic—kind of like a FORTRAN program written in Java.
But if you understand the principles of class hierarchies and the principles of abstract data types, you can take a solution to a problem in one style and craft a reasonably idiomatic solution to the same problem in the other style. It does take practice.
Addendum: It's also possible to use Haskell's type-class system to try to emulate class hierarchies, but that's a different kettle of fish. Type classes are similar enough to ordinary classes that a number of standard examples work, but they are different enough that there can also be some very big surprises and misfits. While type classes are an invaluable tool for a Haskell programmer, I would recommend that anyone learning Haskell learn to design programs using abstract data types.
Haskell is my favorite language, is a pure functional language.
It does not have side effects, there is no assignment.
If you find to hard the transition to this language, maybe F# is a better place to start with functional programming. F# is not pure.
Objects encapsulate states, there is a way to achieve this in Haskell, but this is one of the issues that takes more time to learn because you must learn some category theory concepts to deeply understand monads. There is syntactic sugar that lets you see monads like non destructive assignment, but in my opinion it is better to spend more time understanding the basis of category theory (the notion of category) to get a better understanding.
Before trying to program in OO style in Haskell, you should ask yourself if you really use the object oriented style in C#, many programmers use OO languages, but their programs are written in the structured style.
The data declaration allows you to define data structures combining products (equivalent to structure in C language) and unions (equivalent to union in C), the deriving part o the declaration allows to inherit default methods.
A data type (data structure) belongs to a class if has an implementation of the set of methods in the class.
For example, if you can define a show :: a -> String method for your data type, then it belong to the class Show, you can define your data type as an instance of the Show class.
This is different of the use of class in some OO languages where it is used as a way to define structures + methods.
A data type is abstract if it is independent of it's implementation. You create, mutate, and destroy the object by an abstract interface, you do not need to know how it is implemented.
Abstraction is supported in Haskell, it is very easy to declare.
For example this code from the Haskell site:
data Tree a = Nil
| Node { left :: Tree a,
value :: a,
right :: Tree a }
declares the selectors left, value, right.
the constructors may be defined as follows if you want to add them to the export list in the module declaration:
node = Node
nil = Nil
Modules are build in a similar way as in Modula. Here is another example from the same site:
module Stack (Stack, empty, isEmpty, push, top, pop) where
empty :: Stack a
isEmpty :: Stack a -> Bool
push :: a -> Stack a -> Stack a
top :: Stack a -> a
pop :: Stack a -> (a,Stack a)
newtype Stack a = StackImpl [a] -- opaque!
empty = StackImpl []
isEmpty (StackImpl s) = null s
push x (StackImpl s) = StackImpl (x:s)
top (StackImpl s) = head s
pop (StackImpl (s:ss)) = (s,StackImpl ss)
There is more to say about this subject, I hope this comment helps!