In Chapter 36.4 of HTDP(How to Design Programs),
I found this warning:
Warning: The state variable is never a parameter of a function.
But as far as I've heard before, in functional programming, functions will be corrupted if they refer state variables. They will not be pure functions anymore. They will be hard to test, do unpredictable works, cannot be memoized ... etc. The state variables also should be passed by as parameters, not just referred as some global constants.
So I wonder
is HTDP is arguing something wrong,
in some of functional programming practices, global state variables are allowed? or
I have wrong idea?
Thanks in advance.
Disclaimer: I like&respect this book very much and learned a lot. Actually I would like to spread good words about this book to my friends(if any). So don't get it wrong.
I don't think there's anything incompatible with what you've heard about functional programming and what is written in the chapter you linked. However, you're conflating two concepts here: the presence of mutable state in functional programs (a purity issue) vs. the order in which things are evaluated, and the restrictions on the syntax you have available to write things down.
Consider: if you're using an eager evaluation strategy, then passing a "state variable" of the kind they describe in that chapter would have the effect of dereferencing it, and you would get the value of the variable as the function argument. Similarly, if the variable was bound as a parameter to the function, you would get a different bit of memory at every call. There are many different options here. The fact that some languages permit you to pass references around as values is not universal.
So they are really just describing global variables (or variables that are accessed from some parent scope), which by their very nature need not be passed to functions as parameters. If the specific language permits pass-by-reference, this might not be such a clear distinction.
I found some explanations of open/closed recursion, but I do not understand why the definition contains the word "recursion", or how it compares with dynamic/static dispatching. Among the explanations I found, there are:
Open recursion. Another handy feature offered by most
languages with objects and classes is the ability for one method
body to invoke another method of the same object via a special
variable called self or, in some languages, this. The special
behavior of self is that it is late-bound, allowing a method defined
in one class to invoke another method that is defined later, in
some subclass of the first. [Ralf Hinze]
... or in Wikipedia :
The dispatch semantics of this, namely that method calls on this are dynamically dispatched, is known as open recursion, and means that these methods can be overridden by derived classes or objects. By contrast, direct named recursion or anonymous recursion of a function uses closed recursion, with early binding.
I also read the StackOverflow question: What is open recursion?
But I do not understand why the word "recursion" is used for the definition. Of course, it can lead to interesting (or dangerous) side-effect if one uses "open recursion" by doing... a method recursion call. But the definitions do not take method/function recursive call directly into account (appart the "closed recursion" in the Wikipedia definition, but it sounds strange since "open recursion" does not refer to recursive call).
Do you know why there is the word "recursion" in the definition? Is it because it is based on another computer science definition that I am not aware of? Should simply saying "dynamic dispatch" not be enough?
I tried to start writing an answer here and then ended up writing an entire blog post about it. The TL;DR is:
So, if you compare a real object-oriented language to a simpler language with just structures and functions, the differences are:
All of the methods can see and call each other. The order they are defined doesn’t matter since their definitions are “simultaneous” or mutually recursive.
The base methods have access to the derived receiver object (i.e. this or self in other languages) so they don’t close over just each other. They are open to overridden methods.
Thus: open recursion.
In object oriented programming languages when you define a variable it ends up becoming a reference to an object. The variable is not itself the object, and instead points to the object that carries the value that was assigned to that variable.
Question is how does this work so efficiently? What is the mechanism of how a variable is assigned to an object?
The way I think about the organization is as a linked list, however could not find references how the data is structured in languages such as Ruby or Java.
In object oriented programming languages when you define a variable it ends up becoming a reference to an object.
This is not always true. For example, C++ can be considered an object-oriented language, yet a user of the language can use a variable as a reference/pointer or explicitly as a value.
However, you are right in that some (typically higher-level) OO languages implicitly use references so that the user of the language does not have to worry about these kinds of implementation "details" in regards to performance. They try to take responsibility for this instead.
how does this work so efficiently? What is the mechanism of how a variable is assigned to an object?
Consider a simple example. What happens when an object is passed as a parameter to a function? A copy of that object must be made so that the function can refer to that object locally. For an OO language that implicitly uses references, only the address of the object needs to be copied, whereas a true pass-by-value would require a copy of the complete memory contents of the object, which could potentially be very large (think a collection of objects or similar).
A detailed explanation of this involves getting into the guts of assembly. For example, why does a copy of an object to a function call even need to be made in the first place? Why does the indirection of an address not take longer than a direct value? Etc.
Related
What's the difference between passing by reference vs. passing by value?
I'm studying lisp language (to do lisp routines) and in a general context i know what's a routine, but in a technical context i can talk about it, because i'm starting to learn routines now. So, what's the real definition of routine?
(i've already "googled" this but didn't find anything)
The term routine derives from subroutine, which is a more common term in languages like BASIC where one actually creates SUBroutines. (BASIC actually had a difference between a SUBroutine and a FUNCTION, but nevertheless...)
From the Wikipedia entry:
In computer science, a subroutine (also called procedure, function, routine, method, or subprogram) is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code.
As the name "subprogram" suggests, a subroutine behaves in much the same way as a computer program that is used as one step in a larger program or another subprogram. A subroutine is often coded so that it can be started ("called") several times and/or from several places during a single execution of the program, including from other subroutines, and then branch back (return) to the next instruction after the "call" once the subroutine's task is done.
Different languages/environments/eras have different ecosystems and thus different terms to describe the same general concept. I generally only use the term function (or method in an "OOP" environment) these days.
Happy coding.
For fun I have Community Wiki'ed. The list below is hopefully to cover which term(s) is (are) "correct" (widely accepted) to use in a given language to mean routine. Informally routine is used in context of all the languages below so it should be omitted unless it is the defacto term used. Feel free to add, correct, and annotate as appropriate.
C - function
Java - method. While function is also often used, the term function does not appear in the Java Language Specification.
C# - method and function. In the specification, functions refer to function-objects and anonymous functions. They are not the same as methods, which are members of types (classes or structures). Also consider delegates.
JavaScript - function or method. Methods are functions accessed via a property of an object.
Haskell - function. This is the accepted terminology.
Scala - function or method. Method if def member of type, functions are first-class values.
BASIC - function or subroutine. Subroutines do not return values. Supports call-by-reference.
FORTRAN - function or subroutine. Subroutines do not return values. Supports call-by-reference.
LISP - function. DEFUN -> DEfineFUNction, all forms are valid expressions. Also consider macros, which are not themselves functions but are arguably routines.
VHDL - subprograms: functions and procedures. Procedures have no return value.
SmallTalk - method
Python - method
Ruby - method (often interchanged with function? lambdas/Procs may be considered different?)
Perl - function and subroutine. There is only one form to declare a function/SUBroutine so there is no distinction w.r.t. return values. Using method (for object-bound functions) seems less prevalent than in other languages.
Pascal - procedures and functions
Ada - procedures and functions
You can't find a technical definition because there isn't a technical definition specific to lisp. A 'routine', outside of vaudeville, is just another name for a function. While it's been many years since I programmed in Lisp full-time, no one ever used that term in any formal way, or even used it commonly. We talked about 'functions', 'macros', and 'forms.' If someone said, 'oh, there's a routine to calculate how many apples in a pie' it was perfectly informal.
What is open recursion? Is it specific to OOP?
(I came across this term in this tweet by Daniel Spiewak.)
just copying http://www.comlab.ox.ac.uk/people/ralf.hinze/talks/Open.pdf:
"Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first. "
This paper analyzes the possibility of adding OO to ML, with regards to expressivity and complexity. It has the following excerpt on objects, which seems to make this term relatively clear –
3.3. Objects
The simplest form of object is just a record of functions that share a common closure environment that
carries the object state (we can call these simple objects). The function members of the record may or may not
be defined as mutually recursive. However, if one wants to support inheritance with overriding, the structure
of objects becomes more complicated. To enable open recursion, the call-graph of the method functions
cannot be hard-wired, but needs to be implemented indirectly, via object self-reference. Object self-reference
can be achieved either by construction, making each object a recursive, self-referential value (the fixed-point
model), or dynamically, by passing the object as an extra argument on each method call (the self-application
or self-passing model).5 In either case, we will call these self-referential objects.
The name "open recursion" is a bit misleading at first, because it has nothing to do with the recursion that normally is used (a function calling itself); and to that extent, there is no closed recursion.
It basically means, that a thing is referring to itself. I can only guess, but I do think that the term "open" comes from open as in "open for extension".
In that sense an object is open to extension, but still referring to itself.
Perhaps a small example can shed some light on the concept.
Imaging you write a Python class like this one:
class SuperClass:
def method1(self):
self.method2()
def method2(self):
print(self.__class__.__name__)
If you ran this by
s = SuperClass()
s.method1()
It will print "SuperClass".
Now we create a subclass from SuperClass and override method2:
class SubClass(SuperClass):
def method2(self):
print(self.__class__.__name__)
and run it:
sub = SubClass()
sub.method1()
Now "SubClass" will be printed.
Still, we only call method1() as before. Inside method1() the method2() is called, but both are bound to the same reference (self in Python, this in Java). During sub-classing SuperClass method2() is changed, which means that an object of SubClass refers to a different version of this method.
That is open recursion.
In most cases, you override methods and call the overridden methods directly.
This scheme here is using an indirection over self-reference.
P.S.: I don't think this has been invented but discovered and then explained.
Open recursion allows to call another methods of object from within, through special variable like this or self.
In short, open recursion is about something actually not related to OOP, but more general.
The relation with OOP comes from the fact that many typical "OOP" PLs have such properties, but it is essentially not tied to any distinguishing features about OOP.
So there are different meanings, even in same "OOP" language. I will illustrate it later.
Etymology
As mentioned here, the terminology is likely coined in the famous TAPL by BCP, which illustrates the meaning by concrete OOP languages.
TAPL does not define "open recursion" formally. Instead, it points out the "special behavior of self (or this) is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first".
Nevertheless, neither of "open" and "recursion" comes from the OOP basis of a language. (Actually, it is also nothing to do with static types.) So the interpretation (or the informal definition, if any) in that source is overspecified in nature.
Ambiguity
The mentioning in TAPL clearly shows "recursion" is about "method invocation". However, it is not that simple in real languages, which usually do not have primitive semantic rules on the recursive invocation itself. Real languages (including the ones considered as OOP languages) usually specify the semantics of such invocation for the notation of the method calls. As syntactic devices, such calls are subject to the evaluation of some kind of expressions relying on the evaluations of its subexpressions. These evaluations imply the resolution of method name, under some independent rules. Specifically, such rules are about name resolution, i.e. to determine the denotation of a name (typically, a symbol, an identifier, or some "qualified" name expressions) in the subexpression. Name resolution often respects to scoping rules.
OTOH, the "late-bound" property emphasizes how to find the target implementation of the named method. This is a shortcut of evaluation of specific call expressions, but it is not general enough, because entities other than methods can also have such "special" behavior, even make such behavior not special at all.
A notable ambiguity comes from such insufficient treatment. That is, what does a "binding" mean. Traditionally, a binding can be modeled as a pair of a (scoped) name and its bound value, i.e. a variable binding. In the special treatment of "late-bound" ones, the set of allowed entities are smaller: methods instead of all named entities. Besides the considerably undermining the abstraction power of the language rules at meta level (in the language specification), it does not cease the necessity of traditional meaning of a binding (because there are other non-method entities), hence confusing. The use of a "late-bound" is at least an instance of bad naming. Instead of "binding", a more proper name would be "dispatching".
Worse, the use in TAPL directly mix the two meanings when dealing with "recusion". The "recursion" behavior is all about finding the entity denoted by some name, not just specific to method invocation (even in those OOP language).
The title of the chapter (Case Study: Imperative Objects) also suggests some inconsistency. Obviously, the so-called late binding of method invocation has nothing to do with imperative states, because the resolution of the dispatching does not require mutable metadata of invocation. (In some popular sense of implementation, the virtual method table need not to be modifiable.)
Openness
The use of "open" here looks like mimic to open (lambda) terms. An open term has some names not bound yet, so the reduction of such a term must do some name resolution (to compute the value of the expression), or the term is not normalized (never terminate in evaluation). There is no difference between "late" or "early" for the original calculi because they are pure, and they have the Church-Rosser property, so whether "late" or not does not alter the result (if it is normalized).
This is not the same in the language with potentially different paths of dispatching. Even that the implicit evaluation implied by the dispatching itself is pure, it is sensitive to the order among other evaluations with side effects which may have dependency on the concrete invocation target (for example, one overrider may mutate some global state while another can not). Of course in a strictly pure language there can be no observable differences even for any radically different invocation targets, a language rules all of them out is just useless.
Then there is another problem: why it is OOP-specific (as in TAPL)? Given that the openness is qualifying "binding" instead of "dispatching of method invocation", there are certainly other means to get the openness.
One notable instance is the evaluation of a procedure body in traditional Lisp dialects. There can be unbound symbols in the body and they are only resolved when the procedure being called (rather than being defined). Since Lisps are significant in PL history and the are close to lambda calculi, attributing "open" specifically to OOP languages (instead of Lisps) is more strange from the PL tradition. (This is also a case of "making them not special at all" mentioned above: every names in function bodies are just "open" by default.)
It is also arguable that the OOP style of self/this parameter is equivalent to the result of some closure conversion from the (implicit) environment in the procedure. It is questionable to treat such features primitive in the language semantics.
(It may be also worth noting, the special treatment of function calls from symbol resolution in other expressions is pioneered by Lisp-2 dialects, not any of typical OOP languages.)
More cases
As mentioned above, different meanings of "open recursion" may coexist in a same "OOP" language.
C++ is the first instance here, because there are sufficient reasons to make them coexist.
In C++, name resolution are all static, normatively name lookup. The rules of name lookup vary upon different scopes. Most of them are consistent with identifier lookup rules in C (except for the allowance of implicit declarations in C but not in C++): you must first declare the name, then the name can be lookup in the source code (lexically) later, otherwise the program is ill-formed (and it is required to issue an error in the implementation of the language). The strict requirement of such dependency of names are considerable "closed", because there are no later chance to recover from the error, so you cannot directly have names mutually referenced across different declarations.
To work around the limitation, there can be some additional declarations whose sole duty is to break the cyclic dependency. Such declarations are called "forward" declarations. Using of forward declarations still does not require "open" recursion, because every well-formed use must statically see the previous declaration of that name, so each name lookup does not require additional "late" binding.
However, C++ classes have special name lookup rules: some entities in the class scope can be referred in the context prior to their declaration. This makes mutual recursive use of name across different declarations possible without any additional "forward" declarations to break the cycle. This is exactly the "open recursion" in TAPL sense except that it is not about method invocation.
Moreover, C++ does have "open recursion" as per the descriptions in TAPL: this pointer and virtual functions. Rules to determine the target (overrider) of virtual functions are independent to the name lookup rules. A non-static member defined in a derived class generally just hide the entities with same name in the base classes. The dispatching rules kick in only on virtual function calls, after the name lookup (the order is guaranteed since evaulations of C++ function calls are strict, or applicative). It is also easy to introduce a base class name by using-declaration without worry about the type of the entity.
Such design can be seen as an instance of separate of concerns. The name lookup rules allows some generic static analysis in the language implementation without special treatment of function calls.
OTOH, Java have some more complex rules to mix up name lookup and other rules, including how to identify the overriders. Name shadowing in Java subclasses is specific to the kind of entities. It is more complicate to distinguish overriding with overloading/shadowing/hiding/obscuring for different kinds. There also cannot be techniques of C++'s using-declarations in the definition of subclasses. Such complexity does not make Java more or less "OOP" than C++, anyway.
Other consequences
Collapsing the bindings about name resolution and dispatching of method invocation leads to not only ambiguity, complexity and confusion, but also more difficulties on the meta level. Here meta means the fact that name binding can exposing properties not only available in the source language semantics, but also subject to the meta languages: either the formal semantic of the language or its implementation (say, the code to implement an interpreter or a compiler).
For example, as in traditional Lisps, binding-time can be distinguished from evaluation-time, because program properties revealed in binding-time (value binding in the immediate contexts) is more close to meta properties compared to evaluation-time properties (like the concrete value of arbitrary objects). An optimizing compiler can deploy the code generation immediately depending on the binding-time analysis either statically at the compile-time (when the body is to be evaluate more than once) or derferred at runtime (when the compilation is too expensive). There is no such option for languages blindly assume all resolutions in closed recursion faster than open ones (and even making them syntactically different at the very first). In such sense, OOP-specific open recursion is not just not handy as advertised in TAPL, but a premature optimization: giving up metacompilation too early, not in the language implementation, but in the language design.