In statically typed language, people are able to use algebraic data type to abstract data and also generate constructors, or use class, trait and mixin to deal with data abstraction.
In dynamically typed language, like Python and Ruby, they all provide a class system to users.
But what about scheme, the simplest functional language, the closest one to λ-calculi, how does it abstract data?
Do scheme programmers usually just put data in a list or a lambda abstraction, and write some accessor function to make it look like a tree or something else? like EOPL says: specifying data via interfaces.
And then how does this abstraction technique relate to abstract data type (ADT) and objects? with regard to On understanding data abstraction, revisited.
What SICP (and I guess, EOPL) is advocating is just using functions to access data; then you can always switch one set of functions for another, implementing the same named set of functions to work with another concrete implementation. And that (i.e. the sets of such functions) is what forms the "interfaces", and that's what you put in different source files, and by just loading the appropriate one you can switch the concrete implementation while all the other code is none the wiser. That's what makes it "abstract" datatype.
As for the algebraic data types, the old bare-bones Scheme way is to create closures (that hold and hide the data) which respond to "messages" and thus become "objects" (something about "Scheme mailboxes"). This gives us products, i.e. records, and functions we get for free from Scheme itself. For sum types, just as in C/C++, we can use tagged unions in a disciplined manner (or, again, hide the specifics behind a set of "interface" functions).
EOPL has something called "variant-case" which handles such sum types in a manner similar to pattern matching. Searching brings up e.g. this link saying
I'm using DrScheme w/ the EOPL textbook, which uses define-record and variant-case. I've got the macro definitions from the PLT site, but am now dealing with ...
so seems relevant, as one example.
Related
I have read a lot about abstract data types (ADTs) and I'm askig myself if there are non-abstract/ concrete datatypes?
There is already a question on SO about ADTs, but this question doesn't cover "non-abstract" data types.
The definition of ADT only mentions what operations are to be
performed but not how these operations will be implemented
reference
So a ADT is hiding the concrete implementation from the user and "only" offers a bunch of permissible operations/ methods; e.g., the Stack in Java (reference). Only methods like pop(), push(), empty() are visible and the concrete implementation is hidden.
Following this argumentation leads me to the question, if there is a "non-abstract" data type?
Even a primitive data type like java.lang.Integer has well defined operations, like +, -, ... and according to wikipedia it is a ADT.
For example, integers are an ADT, defined as the values …, −2, −1, 0, 1, 2, …, and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc.,
reference
The java.lang.Integer is not a primitive type. It is an ADT that wraps the primitve java type int. The same holds for the other Java primitive types and the corresponding wrappers.
You don't need OOP support in a language to have ADTs. If you don't have support, you establish conventions for the ADT in the code you write (i.e. you only use it as previoulsy defined by the operations and possible values of the ADT)
That's why ADT's predate the class and object concepts present in OOP languages.They existed before. Statements like class just introduced direct support in the languages, allowing compilers to check what you are doing with the ADTs.
Primitive types are just values that can be stored in memory, without any other associated code. They don't know about themselves or their operations. And their internal representation is known by external actors, unlike the ADTs. Just like the possible operations. These are manipulations to the values done externally, from the outside.
Primitive types carry with them, although you don't necessary see it, implementation details relating the CPU or virtual machine architecture. Because they map to CPU available register sizes and instructions that the CPU executes directly. Hence the maximum integer value limits, for example.
If I am allowed to say this, the hardware knows your primitive types.
So your non-abstract data types are the primitive types of a language,
if those types aren't themselves ADT's too. If they happen to be ADTs,
you probably have to create them (not just declare them; there will
be code setting up things in memory, not only the storage in a certain
address), so they have an identity, and they usually offer methods
invoked through that identity, that is, they know about themselves.
Because in some languages everything is an object, like in Python, the
builtin types (the ones that are readily available with no
need to define classes) are sometimes called primitive too, despite
being no primitive at all by the above definition.
Edit:
As mentioned by jaco0646, there is more about concrete/abstract
words in OOP.
An ADT is already an abstraction. It represents a category
of similar objects you can instantiate from.
But an ADT can be even more abstract, and is referred as such (as
opposed to concrete data types) if you declare it with no intention of
instantiating objects from it. Usually you do this because other "concrete"
ADTs (the ones you instantiate) inherit from the "abstract" ADT. This allows the sharing and extension of behaviour between several different ADTs.
For example you can define an API like that, and make one or more different
ADTs offer (and respect) that API to their users, just by inheritance.
Abstract ADTs maybe defined by you or be available in language types or
libraries.
For example a Python builtin list object is also a collections.abc.Iterable.
In Python you can use multiple inheritance to add functionality like that.
Although there are other ways.
In Java you can't, but you have interfaces instead, and can declare a class to implement one or more interfaces, besides possibly extending another class.
So an ADT definition whose purpose is to be directly instantiated, is a
concrete ADT. Otherwise it is abstract.
A closely related notion is that of an abstract method in a class.
It is a method you don't fill with code, because it is meant to be filled by children classes that should implement it, respecting its signature (name and parameters).
So depending on your language you will find possible different (or similar) ways of implementing this concepts.
I agree with the answer from #progmatico, but I would add that concrete (non-abstract) data types include more than primitives.
In Java, Stack happens to be a concrete data type, which extends another concrete data type Vector, which extends an ADT AbstractList.
The interfaces implemented by AbstractList are also ADTs: Iterable, Collection, List.
My Problem
I have a class with just a few fields but which represents a relatively complicated data structure. This class is central in my program and over time I found myself adding more and more functionality into it, making things a mess. Since (almost) all of its methods rely on its internal fields, I could not think of a way to move some of the methods elsewhere, even though most methods are independent of each other. How can I refactor this class to make it simpler and reduce the number of methods which are directly implemented in it?
More Information
The class in question represents a sort of automaton. It supports a ton of operations such as retrieving information about it, performing various binary operations between it and other automata, querying for specific information stored inside it, saving it to file, etc. Almost all of these operations depend on the precise implementation of the class - in my specific case I maintain an edge-set-based implementation, but other implementations were also used in the past and might be used again in the future.
Except for a narrow set of basic helper methods which are commonly used, most methods are independent of each other.
The language I am using is Java, but I'm hoping for general answers which could be applied to any statically-typed, object-oriented language.
What I've Tried
I tried refactoring it somehow to multiple types, but each of its operations require access to most of its fields, and I'm hesitant about migrating these operations elsewhere because I can't think of a way to do that without exposing the class's implementation.
I'm also not sure where I should migrate the operations to, assuming they are indeed independent of the implementation. An external utility class? An abstract base type? Will appreciate any input about this.
Perhaps you could remodel the data that your class holds, so that instead of holding the data directly, it holds objects that hold the data? Then you could move the methods that manipulate that data into the new classes, leaving the original class as a sort of container / dispatcher class.
I'm currently enjoying the transition from an object oriented language to a functional language. It's a breath of fresh air, and I'm finding myself much more productive than before.
However - there is one aspect of OOP that I've not yet seen a satisfactory answer for on the FP side, and that is polymorphism. i.e. I have a large collection of data items, which need to be processed in quite different ways when they are passed into certain functions. For the sake of argument, let's say that there are multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.
In OOP that can be handled relatively well using polymorphism: either through composition+inheritance or a prototype-based approach.
In FP I'm a bit stuck between:
Writing or composing pure functions that effectively implement polymorphic behaviours by branching on the value of each data item - feels rather like assembling a huge conditional or even simulating a virtual method table!
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
What are the recommended functional approaches for this kind of situation? Are there other good alternatives?
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
If virtual method dispatch is the way you want to approach the problem, this is a perfectly reasonable approach. As for separating functions from data, that is a distinctly non-functional notion to begin with. I consider the fundamental principle of functional programming to be that functions ARE data. And as for your feeling that you're simulating a virtual function, I would argue that it's not a simulation at all. It IS a virtual function table, and that's perfectly OK.
Just because the language doesn't have OOP support built in doesn't mean it's not reasonable to apply the same design principles - it just means you'll have to write more of the machinery that other languages provide built-in, because you're fighting against the natural spirit of the language you're using. Modern typed functional languages do have very deep support for polymorphism, but it's a very different approach to polymorphism.
Polymorphism in OOP is a lot like "existential quantification" in logic - a polymorphic value has SOME run-time type but you don't know what it is. In many functional programming languages, polymorphism is more like "universal quantification" - a polymorphic value can be instantiated to ANY compatible type its user wants. They're two sides of the exact same coin (in particular, they swap places depending on whether you're looking at a function from the "inside" or the "outside"), but it turns out to be extremely hard when designing a language to "make the coin fair", especially in the presence of other language features such as subtyping or higher-kinded polymorphism (polymorphism over polymorphic types).
If it helps, you may want to think of polymorphism in functional languages as something very much like "generics" in C# or Java, because that's exactly the type of polymorphism that, e.g., ML and Haskell, favor.
Well, in Haskell you can always make a type-class to achieve a kind of polymorphism. Basically, it is defining functions that are processed for different types. Examples are the classes Eq and Show:
data Foo = Bar | Baz
instance Show Foo where
show Bar = 'bar'
show Baz = 'baz'
main = putStrLn $ show Bar
The function show :: (Show a) => a -> String is defined for every data type that instances the typeclass Show. The compiler finds the correct function for you, depending on the type.
This allows to define functions more generally, for example:
compare a b = a < b
will work with any type of the typeclass Ord. This is not exactly like OOP, but you even may inherit typeclasses like so:
class (Show a) => Combinator a where
combine :: a -> a -> String
It is up to the instance to define the actual function, you only define the type - similar to virtual functions.
This is not complete, and as far as I know, many FP languages do not feature type classes. OCaml does not, it pushes that over to its OOP part. And Scheme does not have any types. But in Haskell it is a powerful way to achieve a kind of polymorphism, within limits.
To go even further, newer extensions of the 2010 standard allow type families and suchlike.
Hope this helped you a bit.
Who said
defining pure functions separately from data
is best practice?
If you want polymorphic objects, you need objects. In a functional language, objects can be constructed by glueing together a set of "pure data" with a set of "pure functions" operating on that data. This works even without the concept of a class. In this sense, a class is nothing but a piece of code that constructs objects with the same set of associated "pure functions".
And polymorphic objects are constructed by replacing some of those functions of an object by different functions with the same signature.
If you want to learn more about how to implement objects in a functional language (like Scheme), have a look into this book:
Abelson / Sussman: "Structure and Interpration of Computer programs"
Mike, both your approaches are perfectly acceptable, and the pros and cons of each are discussed, as Doc Brown says, in Chapter 2 of SICP. The first suffers from having a big type table somewhere, which needs to be maintained. The second is just traditional single-dispatch polymorphism/virtual function tables.
The reason that scheme doesn't have a built-in system is that using the wrong object system for the problem leads to all sorts of trouble, so if you're the language designer, which to choose? Single despatch single inheritance won't deal well with 'multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.'
To synopsize, there are many ways of constructing objects, and scheme, the language discussed in SICP, just gives you a basic toolkit from which you can construct the one you need.
In a real scheme program, you'd build your object system by hand and then hide the associated boilerplate with macros.
In clojure you actually have a prebuilt object/dispatch system built in with multimethods, and one of its advantages over the traditional approach is that it can dispatch on the types of all arguments. You can (apparently) also use the heirarchy system to give you inheritance-like features, although I've never used it, so you should take that cum grano salis.
But if you need something different from the object scheme chosen by the language designer, you can just make one (or several) that suits.
That's effectively what you're proposing above.
Build what you need, get it all working, hide the details with macros.
The argument between FP and OO is not about whether data abstraction is bad, it's about whether the data abstraction system is the place to stuff all the separate concerns of the program.
"I believe that a programming language should allow one to define new data types. I do not believe that a program should consist solely of definitions of new data types."
http://www.haskell.org/haskellwiki/OOP_vs_type_classes#Everything_is_an_object.3F nicely discusses some solutions.
When I have a series of processes which are similar in nature but work on slightly different types of objects, do I unify the type of work in a single utility class, or do I put the functionality directly on each object that will need to utilized the functionality?
I'm not concerned about a specific case per-se, but I'm most curious about what factors go into this decision.
I think it depends on the class-ancestry of your objects, the real difference in logic between each object, and the future-possible-need to do this on some other kind of class.
It sounds to me like the utility class is a good way to go if the functionaliy you're applying to multiple classes is largely the same, and could be applied to future classes down the road.
if on the other hand the functionality is different enoguh that you'd end up with a big switch/case statement in your utility class to accomodate the differnet object types, you might want to implement it in the objects themselves.
You have two approaches to your problem: One is to use generic programming (horizontal polymorphism) or to attack it using a more traditional vertical hierarchy based implementation.
You decisions have to be based in the type of similarities that are shared among the various data types. In the case that we can define a complete and orthogonal contract that can be operated in any type then we can easily decide to use generics.
This for example is the case with List, Dictionary and all the class under the System.Collections.Generic name space that eventually replaced their corresponding non generic counterparts of the early versions of .NET.
In the other hand, again from the .NET world we can use as an example of a vertical hierarchy, the UserControl class than derives from ContainerControl and serves as the base for other controls specializing it behavior using its virtual methods...
In most of the cases though the design of your class hierarchy involves a lot of judgment calls that are not always to defined deterministically as they rely more on your experience and talent as a developer rather in a concrete model that can be applied across the board in every possible situation..
This is probably a newbie question since I'm new to design patterns but I was looking at the Template Method and Strategy DP's and they seem very similar. I can read the definitions, examine the UML's and check out code examples but to me it seem like the Strategy pattern is just using the Template Method pattern but you just happen to passing it into and object (i.e. composition).
And for that matter the Template Method seems like that is just basic OO inheritance.
Am I missing some key aspect to their differences? Am I missing something about the Template Method that makes it more that just basic inheritance?
Note: There is a previous post on this (672083) but its more on when to use it, which kind of helps me get it a bit more but I want valid my thoughts on the patterns themselves.
It basically all comes down to semantics. The strategy pattern allows you to pass in a particular algorithm/procedure (the strategy) to another object and that will use it. The template method allows you to override particular aspects of an algorithm while still keeping certain aspects of it the same (keep the order the same, and have things that are always done at the start and end for example... the 'template') while inheritance is a way of modelling 'IS-A' relationships in data models.
Certainly, template methods are most easily implemented using inheritance (although you could just as easily use composition, especially once you have functors), and strategy patterns are frequently also template methods but where the syntax is similar the meanings are vastly different.
The Strategy design pattern
provides a way to exchange the algorithm of an object
dynamically at run-time
(via object composition).
For example, calculating prices in an order processing system.
To calculate prices in different ways,
different pricing algorithms can be supported
so that which algorithm to use can be selected (injected) and exchanged dynamically at run-time.
The Template Method
design pattern
provides a way to
redefine some parts of the behavior of a class statically at compile-time
(via subclassing).
For example, designing reusable applications (frameworks).
The application implements the common (invariant) parts of the behavior
so that users of the application can write subclasses to redefine
the variant parts to suit their needs.
But subclass writers should neither be able to change the invariant parts of
the behavior nor the behavior's structure
(the structure of invariant and variant parts).