Related
I am currently working on the simulation of a physical system in Fortran90 with something like 50 millions particles. Each has a position x (to simplify).
For now, I am using a 1D vector that contains the position of each particle. And when I have to iterate on every particle, I just go through that vector (as I took care to sort the particles to limit cache misses).
I am now considering creating a particle class. But what about the access to its position as I iterate ? Will it be as fast as the previous case ?
So, what does the compiler do to store the attributes of an object ? And a fortiori, what about the case with more than one attributes?
Thank you for your time.
On "how are derived types stored":
Fortran Standard requires components of a sequence type to be stored (in memory) as a sequence of contiguous storage, in components' declaration order. Sequence types are those declared with a SEQUENCE statement, which implies that the type shall have at least one component, each component shall be of an intrinsic or sequence type, shall not be a parameterized or extensible type, and can't have type-bound procedures. If you want this behavior and your type is suitable, make it a sequence type (you may take data alignment into consideration).
On the other hand, Fortran Standard does not state how compilers have to organize storage for non-sequence derived types. That's not bad at all, as compilers are free to optimize storage. Most of times, you may expect almost the same as sequence types: things stored contiguously whenever posible (padding may apply). Arrays and strings are always contiguous. Pointer and allocatable components are a only reference, for obvious reasons, and their targets lay somewhere else.
From the Standard:
A structure resolves into a sequence of components. Unless the
structure includes a SEQUENCE statement, the use of this terminology
in no way implies that these components are stored in this, or any
other, order. Nor is there any requirement that contiguous storage be
used. The sequence merely refers to the fact that in writing the
definitions there will necessarily be an order in which the components
appear, and this will define a sequence of components. This order is
of limited significance because a component of an object of derived
type will always be accessed by a component name except in the
following contexts: the sequence of expressions in a derived-type
value constructor, intrinsic assignment, the data values in namelist
input data, and the inclusion of the structure in an input/output list
of a formatted data transfer, where it is expanded to this sequence of
components. Provided the processor adheres to the defined order in
these cases, it is otherwise free to organize the storage of the
components for any nonsequence structure in memory as best suited to
the particular architecture.
On "Is it faster to have a derived type than independent arrays":
As #VladmirF said in comment, its a broad topic, depends highly on how are you accessing and operating your data, and has been asked and answered before (Check links on its comment). You may find a lot about it arround (link1, link2) and I'll add this one on "cache blocking" thay may interest you.
I am making a Mathematics web program which allows the user to compute and prove various quantities or statements, e.g. determinant of a matrix, intersection of sets, determine whether a given map is a homomorphism. I decided to write the code using the OOP paradigm (in PHP, to handle some of the super heavy computations that a user's browser might not appreciate), since I could easily declare sets as Set objects, matrices as Matrix objects, etc. and keep some of the messy details of determining things such as cardinality, determinants, etc. in the background. However, after getting knee-deep in code, I'm wondering if deciding on OOP was a mistake. Here's why.
I'll use my Matrix class as a simple example. Matrix has the following attributes:
name (type String) (stores name of this matrix)
size (type array) (stores # rows and # columns of this matrix)
entries (type array) (stores this matrix's entries)
is_invertible (type Boolean) (stores whether this matrix can be inverted)
determinant (type Int) (stores the determinant of this matrix)
transpose (type array) (stores the transpose of this matrix)
Creating a new matrix called A would be done like so:
$A = new Matrix("A");
Now, in a general math problem concerning matrices, it could be that we know the matrix's name, size, entries, whether it's invertible, its determinant, or its transpose, or any combination of the above. This means that all of these properties need to be accessible, and certainly any of these properties can be changed by the user, depending on what's given in the problem. (I can give examples of problems for any of these cases, if needed.)
The issue I'm having, then, is that this would break the encapsulation "rule" of OOP ("rule" in quotes since, from what I understand, it's not a hard-and-fast rule, just one that should be upheld to the greatest extent possible). I did some searching on when getters and setters should be used, or even IF they should be used (seems odd to me that they wouldn't, in an OOP setting...), but this did not seem to help me much, as I found many contradictory answers and case-specific opinions.
So, my overall questions are: when the user needs access to modify many (if not all) of an object's attributes, but a class-oriented design seems to be ideal for addressing the programming problem,
Is OOP the best way to structure the code, despite essentially ignoring encapsulation altogether?
Is there an alternative to OOP which allows high user access while maintaining the OO "flavor" (i.e. keeping sets, matrices, etc. as objects)
Is it ok to break the encapsulation rule altogether once in a while, if the problem calls for it? Or is that not in the spirit of OOP?
What you are trying to do is not necessarily outside the scope of OOP. The thing is that you have a different model than what would usually be described in programming textbooks (where, for example, the values of the matrix would be always present and all of the functions could be simple methods). (Perhaps this is why the question was unfairly downvoted.) Nothing prevents you from storing values like "is_invertible" internally and implementing setter and getter methods. Doing this might make sense if you are trying to learn OOP. But I think other problems (see coding textbooks) might be easier for learning purposes. I see that a remote goal would be to capture some of mathematics as an OOP framework. But the whole mathematical universe is immensely richer than any fixed architecture (results like Gödel's theorem put a theoretical limit). You can only succeed in developing a framework for a very narrow application, for example solving certain equations. That's what symbolic algebra programs do: you can look at how, for example, SymPy or perhaps parts of Maple and Mathematica are implemented. In my view, the OOP paradigm can be both very useful and too restrictive / unnecessary depending on the task (you can certainly find more about shorcomings of OOP in Wikipedia or elsewhere). Also, your problem can be seen as writing a small programming language - in many of them you have sets, numbers, etc as objects.
You can use only rudimentary OOP or no OOP at all. You can use functional programming.
You should Google/read more about this on this or other sites. Is it OK to sometimes walk across the road when the red traffic light is on?
I wonder if the concept of multiple dispatch (that is, built-in support, as if the dynamic dispatch of virtual methods is extended to the method's arguments as well) should be included in an object-oriented language if its impact on performance would be negligible.
Problem
Consider the following scenario: I have a -- not necessarily flat -- class hierarchy containing types of animals. At different locations in my code, I want to perform some actions on an animal object. I do not care, nor can I control, how this object reference is obtained. I might encounter it by traversing a list of animals, or it might be given to me as one of a method's arguments. The action I want to perform should be specialized depending on the runtime type of the given animal. Examples of such actions would be:
Construct a view-model for the animal in order to present it in the GUI.
Construct a data object (to later store into the DB) representing this type of animal.
Feed the animal with some food, but give different kinds of food depending on the type of the animal (what is more healthy for it)
All of these examples operate on the public API of an animal object, but what they do is not the animal's own business, and therefore cannot be put into the animal itself.
Solutions
One "solution" would be to perform type checks. But this approach is error-prone and uses reflective features, which (in my opinion) is almost always an indication of bad design. Types should be a compile-time concept only.
Another solution would be to "abuse" (sort of) the visitor pattern to mimic double dispatch. But this would require that I change my animals to accept a visitor.
I am sure there are other approaches. Also, the problem of extension should be addressed: If new types of animals join the party, how many code locations need to be adapted, and how can I find them reliably?
The Question
So, in the light of these requirements, shouldn't multiple dispatch be an integral part of any well-designed object-oriented language?
Isn't it natural to make external (not just internal) actions dependent on the dynamic type of a given object?
Best regards!
You are suggesting dynamic dispatching based on method name / signature combined with runtime actual argument types. I think you're crazy.
So, in the light of these requirements, shouldn't multiple dispatch be an integral part of any well-designed object-oriented language?
That there are problems for which the availability of the kind of dispatch strategy you envision would simplify coding is a weak argument for such dispatch being built into any given language, much less every OO language.
Isn't it natural to make external (not just internal) actions dependent on the dynamic type of a given object?
Perhaps, but not everything that seems "natural" is in fact a good idea. Clothes are not natural, for instance, but see what happens if you try going around in public without (somewhere other than Berkeley, anyway).
Some languages already have static dispatch based on argument types, more conventionally called "overloading". Dynamic dispatch based on argument types, on the other hand, is a real mess if there is more than one argument to be considered, and it cannot help but be slow(er). Today's popular OO languages provide for you to perform double dispatch where it is wanted, without the overhead of supporting it in the vast majority of places where you don't want it.
Furthermore, although implementing double-dispatch does present maintenance issues arising from tight coupling between separate components, there are coding strategies that can help keep that manageable. It is anyway unclear to what extent having argument-based multiple dispatch built in to a given language would actually help with that problem.
One "solution" would be to perform type checks. But this approach is
error-prone and uses reflective features, which (in my opinion) is
almost always an indication of bad design. Types should be a
compile-time concept only.
You're wrong. All uses of virtual functions, virtual inheritance, and such things involve reflective features and dynamic types. The ability to defer typing until runtime when you need to is absolutely critical and is inherent in even the most basic formulation of the situation you're in, which literally cannot even arise without the use of dynamic types. You even describe your problem as wanting to do different things depending on.. the dynamic type. After all, if there is no dynamic typing, why would you need to do things differently? You already know the concrete final type.
Of course, a bit of run-time typing can handle the problem you got yourself into with run-time typing.
Simply build a dictionary/hash table from type to function. You can add entries to this structure dynamically for any dynamically linked derived types, it's a nice O(1) to look up into, and requires no internal support.
If one restricts oneself to the situation where knowledge of how an object of type X should fnorble an object of type Y must be stored in either class X or class Y, one can have the base type of Y include a method that accepts a reference of X's base type and indicates how much an object knows about how to be fnorbled by the object identified by that reference, as well as a method that asks the Y to have an X fnorble it.
Having done that, one can have X's Fnorble(Y) method start by asking the Y how much it knows about being fnorbled by a particular type of X. If the Y knows more about X than X knows about Y, then X's Fnorble(Y) method should call the Y's BeFnorbledBy(X) method; otherwise, the X should fnorble the Y as best it knows how.
Depending upon how many different kinds of X and Y there are, Y could define BeFnorbledBy overloads methods for different kinds of X, such that when X calls target.BeFnorbledBy(this) it would automatically dispatch directly to a suitable method; such an approach, however, would require every Y to know about every type of X that was "interesting" to anybody whether or not it had any interest that particular type itself.
Note that this approach doesn't accommodate the situation where there might be an outside object of class Z which knows things about how an X should fnorble a Y that neither X nor Y knows directly. That kinds of situation is best handled by having a "rulebook" object where everything that knows about how various kinds of X should fnorble various kinds of Y can tell the rulebook, and code which wants an X to fnorble a Y can ask the rulebook to make that happen. Although languages could provide assistance in cases where rulebooks are singletons, there may be times when it may be useful to have multiple rulebooks. The semantics in those cases are probably best handled by having code use rulebooks directly.
Does functional programming use variables?
If no, how do the functional programs occupy memory?
Both functional programs and imperative (C#, Java) programs use variables, but they define them differently.
In functional programs the variables are like those in mathematics, once a value has been assigned the value cannot change.
In imperative languages it is typical that the values held by variables an be changed.
In both cases variables use memory.
If you're asking about implementation details for various methods of compiling functional programs, you probably need to start with reading "Implementing functional languages: a tutorial". It is a bit out of date (e.g., it does not cover the modern STG approach), but still valuable. Another, even older text to read is Field, Harrison, "Functional programming" (never mind the title, it's mostly about implementing FP compilers).
Pure functional programming uses no variables, but maybe constants in the C sense (that is, assigned only once, but at runtime).
Functional programs occupy memory with the function call "stack", i.e. the current expression and the arguments of recursively called functions.
Does functional programming use variables?
Well, at least you can bind names to values. One can call this name a variable, even if it is not variable. But in math, when we see:
x + 3 = 5
we call x a variale, though it is just another name of 2.
Otoh, the names that are bound to arguments of functions are indeed variable, if only across different invocations of the function.
If no, how do the functional programs occupy memory?
There will be language elements to construct non-primitive values, like lists, tuples, etc. Such a data constructor creates new values from old ones (somewhere in memory, but those details are irrelevant for FP).
I'm currently enjoying the transition from an object oriented language to a functional language. It's a breath of fresh air, and I'm finding myself much more productive than before.
However - there is one aspect of OOP that I've not yet seen a satisfactory answer for on the FP side, and that is polymorphism. i.e. I have a large collection of data items, which need to be processed in quite different ways when they are passed into certain functions. For the sake of argument, let's say that there are multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.
In OOP that can be handled relatively well using polymorphism: either through composition+inheritance or a prototype-based approach.
In FP I'm a bit stuck between:
Writing or composing pure functions that effectively implement polymorphic behaviours by branching on the value of each data item - feels rather like assembling a huge conditional or even simulating a virtual method table!
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
What are the recommended functional approaches for this kind of situation? Are there other good alternatives?
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
If virtual method dispatch is the way you want to approach the problem, this is a perfectly reasonable approach. As for separating functions from data, that is a distinctly non-functional notion to begin with. I consider the fundamental principle of functional programming to be that functions ARE data. And as for your feeling that you're simulating a virtual function, I would argue that it's not a simulation at all. It IS a virtual function table, and that's perfectly OK.
Just because the language doesn't have OOP support built in doesn't mean it's not reasonable to apply the same design principles - it just means you'll have to write more of the machinery that other languages provide built-in, because you're fighting against the natural spirit of the language you're using. Modern typed functional languages do have very deep support for polymorphism, but it's a very different approach to polymorphism.
Polymorphism in OOP is a lot like "existential quantification" in logic - a polymorphic value has SOME run-time type but you don't know what it is. In many functional programming languages, polymorphism is more like "universal quantification" - a polymorphic value can be instantiated to ANY compatible type its user wants. They're two sides of the exact same coin (in particular, they swap places depending on whether you're looking at a function from the "inside" or the "outside"), but it turns out to be extremely hard when designing a language to "make the coin fair", especially in the presence of other language features such as subtyping or higher-kinded polymorphism (polymorphism over polymorphic types).
If it helps, you may want to think of polymorphism in functional languages as something very much like "generics" in C# or Java, because that's exactly the type of polymorphism that, e.g., ML and Haskell, favor.
Well, in Haskell you can always make a type-class to achieve a kind of polymorphism. Basically, it is defining functions that are processed for different types. Examples are the classes Eq and Show:
data Foo = Bar | Baz
instance Show Foo where
show Bar = 'bar'
show Baz = 'baz'
main = putStrLn $ show Bar
The function show :: (Show a) => a -> String is defined for every data type that instances the typeclass Show. The compiler finds the correct function for you, depending on the type.
This allows to define functions more generally, for example:
compare a b = a < b
will work with any type of the typeclass Ord. This is not exactly like OOP, but you even may inherit typeclasses like so:
class (Show a) => Combinator a where
combine :: a -> a -> String
It is up to the instance to define the actual function, you only define the type - similar to virtual functions.
This is not complete, and as far as I know, many FP languages do not feature type classes. OCaml does not, it pushes that over to its OOP part. And Scheme does not have any types. But in Haskell it is a powerful way to achieve a kind of polymorphism, within limits.
To go even further, newer extensions of the 2010 standard allow type families and suchlike.
Hope this helped you a bit.
Who said
defining pure functions separately from data
is best practice?
If you want polymorphic objects, you need objects. In a functional language, objects can be constructed by glueing together a set of "pure data" with a set of "pure functions" operating on that data. This works even without the concept of a class. In this sense, a class is nothing but a piece of code that constructs objects with the same set of associated "pure functions".
And polymorphic objects are constructed by replacing some of those functions of an object by different functions with the same signature.
If you want to learn more about how to implement objects in a functional language (like Scheme), have a look into this book:
Abelson / Sussman: "Structure and Interpration of Computer programs"
Mike, both your approaches are perfectly acceptable, and the pros and cons of each are discussed, as Doc Brown says, in Chapter 2 of SICP. The first suffers from having a big type table somewhere, which needs to be maintained. The second is just traditional single-dispatch polymorphism/virtual function tables.
The reason that scheme doesn't have a built-in system is that using the wrong object system for the problem leads to all sorts of trouble, so if you're the language designer, which to choose? Single despatch single inheritance won't deal well with 'multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.'
To synopsize, there are many ways of constructing objects, and scheme, the language discussed in SICP, just gives you a basic toolkit from which you can construct the one you need.
In a real scheme program, you'd build your object system by hand and then hide the associated boilerplate with macros.
In clojure you actually have a prebuilt object/dispatch system built in with multimethods, and one of its advantages over the traditional approach is that it can dispatch on the types of all arguments. You can (apparently) also use the heirarchy system to give you inheritance-like features, although I've never used it, so you should take that cum grano salis.
But if you need something different from the object scheme chosen by the language designer, you can just make one (or several) that suits.
That's effectively what you're proposing above.
Build what you need, get it all working, hide the details with macros.
The argument between FP and OO is not about whether data abstraction is bad, it's about whether the data abstraction system is the place to stuff all the separate concerns of the program.
"I believe that a programming language should allow one to define new data types. I do not believe that a program should consist solely of definitions of new data types."
http://www.haskell.org/haskellwiki/OOP_vs_type_classes#Everything_is_an_object.3F nicely discusses some solutions.