Related
In statically typed language, people are able to use algebraic data type to abstract data and also generate constructors, or use class, trait and mixin to deal with data abstraction.
In dynamically typed language, like Python and Ruby, they all provide a class system to users.
But what about scheme, the simplest functional language, the closest one to λ-calculi, how does it abstract data?
Do scheme programmers usually just put data in a list or a lambda abstraction, and write some accessor function to make it look like a tree or something else? like EOPL says: specifying data via interfaces.
And then how does this abstraction technique relate to abstract data type (ADT) and objects? with regard to On understanding data abstraction, revisited.
What SICP (and I guess, EOPL) is advocating is just using functions to access data; then you can always switch one set of functions for another, implementing the same named set of functions to work with another concrete implementation. And that (i.e. the sets of such functions) is what forms the "interfaces", and that's what you put in different source files, and by just loading the appropriate one you can switch the concrete implementation while all the other code is none the wiser. That's what makes it "abstract" datatype.
As for the algebraic data types, the old bare-bones Scheme way is to create closures (that hold and hide the data) which respond to "messages" and thus become "objects" (something about "Scheme mailboxes"). This gives us products, i.e. records, and functions we get for free from Scheme itself. For sum types, just as in C/C++, we can use tagged unions in a disciplined manner (or, again, hide the specifics behind a set of "interface" functions).
EOPL has something called "variant-case" which handles such sum types in a manner similar to pattern matching. Searching brings up e.g. this link saying
I'm using DrScheme w/ the EOPL textbook, which uses define-record and variant-case. I've got the macro definitions from the PLT site, but am now dealing with ...
so seems relevant, as one example.
I've been learning Kotlin and I've faced with Collections API. Before Kotlin I'd been learning Java and I know that in Java there's a lot of different types of Collections API. For example, instead of general List, Map, Queue, Set we use ArrayList, HashMap, LinkedList, LinkedMap and etc. Though in Kotlin we only use general types like Map, List, Set but also we can use HashMap and etc. So, what's going on there? Can you help me to figure out?
While Kotlin's original and primary target is the JVM, there is a huge push by JetBrains to make it multiplatform, and support JS and Native as well.
If you're using Kotlin on the JVM, the implementations of any collections you're using will still be the original JDK classes, e.g. java.util.ArrayList or java.util.HashSet. These are not reimplemented by the Kotlin standard library, which has some great benefits:
These are well-tested implementations, which are maintained anyway.
Using the exact same classes makes interop with Java a breeze, as you can pass them back and forth without having to perform conversions or mapping of any kind.
What Kotlin does do is introduce its own collection semantics over these existing implementations, in the form of the standard library interfaces such as List, Map, MutableList, MutableMap and so on. A small bit of compiler magic makes it so that these interfaces are implemented by the existing JDK classes as well.
If you don't need a specific implementation of a certain type of collection, you can use your collections via these interfaces plus the respective factory methods of the standard library (listOf, mapOf, mutableListOf, mutableMapOf, etc.). This keeps your code more generic, and independent of the concrete underlying implementations. You don't know what specific class the standard library mutableListOf function will create for you, only that it will be an object that satisfies the contract of the MutableList interface.
You should basically use these interfaces by default in your code, especially in public API:
In the case of function parameters, this lets clients provide the function with whatever implementation of the collection they wish to give you. If your function can operate on anything that's a List, you should ask for just that interface - no reason to require an ArrayList or LinkedList specifically.
If this is a return type, using these interfaces lets you change the specific implementation that you create internally in the future, without breaking client code. You can promise to just return a MutableList of things, and what implementation backs that list is not exposed to your clients.
If you look at all the collection handling functions of the Kotlin standard library, you'll see that on the surface, they almost exclusively operate on these interfaces. If you dig down deep enough, you'll find ArrayList instances being created, but this is not exposed to the client code, as it doesn't have to care about the concrete implementation most of the time.
Going back to the multiplatform point once more, if you write your code in a way such that it only relies on Kotlin standard library defined types, that code will be easily usable for non-JVM targets. If you reference kotlin.MutableList in your imports, that can immediately compile to JS code, because there's a Kotlin standard library implementation of that interface on each platform. Whether that maps to an existing class directly, wraps an existing class somehow, or is implemented for Kotlin from scratch, again, doesn't have to concern you. But if you refer to java.util.TreeSet in your code, that won't fly for the JS target, as the Java platform classes are not available there.
Can you still use classes such as java.util.ArrayList directly? Of course.
If you don't see your code going multiplatform at some point, using Java collections directly is perfectly okay.
If you need a specific implementation for a List or a Set for performance reasons, sometimes you'll have to use the Java classes directly.
Interestingly, in recent releases of Kotlin, these specific types of implementations (such as an array based list) are wrapped under standard library typealiases too, so that they're platform independent by default: see kotlin.collections.ArrayList or kotlin.collections.HashSet for examples of this. These Kotlin-defined types will usually show up first in IntelliJ completion, so you'll find yourself being pushed towards using them wherever possible. Same thing goes for most exceptions, e.g. IllegalArgumentException.
TL;DR: You can use either Kotlin collection types of Java types in Kotlin, but you should probably do the former whenever you can.
Objective-C’s objects are pretty flexible when compared to similar languages like C++ and can be extended at runtime via Categories or through runtime functions.
Any idea what this sentence means? I am relatively new to Objective-C
While technically true, it may be confusing to the reader to call category extension "at runtime." As Justin Meiners explains, categories allow you to add additional methods to an existing class without requiring access to the existing class's source code. The use of categories is fairly common in Objective-C, though there are some dangers. If two different categories add the same method to the same class, then the behavior is undefined. Since you cannot know whether some other part of the system (perhaps even a system library) adds a category method, you typically must add a prefix to prevent collisions (for example rather than swappedString, a better name would likely be something like rnc_swappedString if this were part of RNCryptor for instance.)
As I said, it is technically true that categories are added at runtime, but from the programmer's point of view, categories are written as though just part of the class, so most people think of them as being a compile-time choice. It is very rare to decide at runtime whether to add a category method or not.
As a beginner, you should be aware of categories, but slow to create new ones. Creating categories is a somewhat intermediate-level skill. It's not something to avoid, but not something you'll use every day. It's very easy to overuse them. See Justin's link for more information.
On the other hand, "runtime functions" really do add new functionality to existing classes or even specific objects at runtime, and are completely under the control of code. You can, at runtime, modify a class such that it responds to a method it didn't previously respond to. You can even generate entirely new classes at runtime that did not exist when the program was compiled, and you can change the class of existing objects. (This is exactly how Key-Value Observation is implemented.)
Modifying classes and objects using the runtime is an advanced skill. You should not even consider using these techniques in production code until you have significant experience. And when you have that experience, it will tell you that you very seldom what to do this anyway. You will know the runtime functions because they are C-based, with names like method_exchangeImplmentations. You won't mistake them for normal ObjC (and you generally have to import objc/runtime.h to get to them.)
There is a middle-ground that bleeds into runtime manipulation called message forwarding and dynamic message resolution. This is often used for proxy objects, and is implemented with -forwardingTargetForSelector, +resolveInstanceMethod, and some similar methods. These are tools that allow classes to modify themselves at runtime, and is much less dangerous than modifying other classes (i.e. "swizzling").
It's also important to consider how all of this translates to Swift. In general, Swift has discouraged and restricted the use of runtime class manipulation, but it embraces (and improves) category-like extensions. By the time you're experienced enough to dig into the runtime, you will likely find it an even more obscure skill than it is today. But you will use extensions (Swift's version of categories) in every program.
A category allows you to add functionality to an existing class that you do not have access to source code for (System frameworks, 3rd party APIs etc). This functionality is possible by adding methods to a class at runtime.
For example lets say I wanted to add a method to NSString that swapped uppercase and lowercase letters called -swappedString. In static languages (such as C++), extending classes like this is more difficult. I would have to create a subclass of NSString (or a helper function). While my own code could take advantage of my subclass, any instance created in a library would not use my subclass and would not have my method.
Using categories I can extend any class, such as adding a -swappedString method and use it on any instance of the class, such asNSString transparently [anyString swappedString];.
You can learn more details from Apple's Docs
I am reading an article on extension of interface at following link.
http://wiki.hsr.ch/APF/files/ExtensionInterface.pdf
It has been mentioned here on page 142
Over time the addition of these requests can bloat the interface with
functionality not anticipated in the initial framework design. If new
methods are added to the "universalComponent" interface directly, all
client code must be updated and recompiled. This is tedious and
error-prone.
My question is (Assume we are using C++ to develop)
Why we have to compile client code if we add new methods to interface and not
modifying any existing functions in interface?
Thanks!
I haven't read the article, but for starters, I would suggest to de-emphasize the terms "method" and "interface" in C++. Those terms are popular in strict OO languages like Java, but C++ is a broader, multi-paradigm language.
With that said, "adding methods to interfaces" is really just adding more virtual member functions to a base class. Changing the base class changes the definition of all derived classes, and thus all code that requires the complete type of any derived class or of the base class must be recompiled.
C++ types are not a runtime feature. Types only exist at compile time, and the compiler must have full access to the type definitions. (Again in contrast to other languages!) The interface-implementation relationship exists purely at compile-time and cannot be "precompiled". So there's really no such thing as "modifying the interface" that would produce runtime-modularity. The "interface" concept is just a neat mnemonic that you can use when designing your application, but it does not save you from recompiling. Changing a class definition changes the internal representation of the class, and you cannot (in general) make a correct C++ program unless all parts of the program see the same class definitions.
Adding a method to a class that is involved in polymorphism (means it has at least one virtual member function) potentially changes the binary layout of objects of that class and it's subclasses.
What is open recursion? Is it specific to OOP?
(I came across this term in this tweet by Daniel Spiewak.)
just copying http://www.comlab.ox.ac.uk/people/ralf.hinze/talks/Open.pdf:
"Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first. "
This paper analyzes the possibility of adding OO to ML, with regards to expressivity and complexity. It has the following excerpt on objects, which seems to make this term relatively clear –
3.3. Objects
The simplest form of object is just a record of functions that share a common closure environment that
carries the object state (we can call these simple objects). The function members of the record may or may not
be defined as mutually recursive. However, if one wants to support inheritance with overriding, the structure
of objects becomes more complicated. To enable open recursion, the call-graph of the method functions
cannot be hard-wired, but needs to be implemented indirectly, via object self-reference. Object self-reference
can be achieved either by construction, making each object a recursive, self-referential value (the fixed-point
model), or dynamically, by passing the object as an extra argument on each method call (the self-application
or self-passing model).5 In either case, we will call these self-referential objects.
The name "open recursion" is a bit misleading at first, because it has nothing to do with the recursion that normally is used (a function calling itself); and to that extent, there is no closed recursion.
It basically means, that a thing is referring to itself. I can only guess, but I do think that the term "open" comes from open as in "open for extension".
In that sense an object is open to extension, but still referring to itself.
Perhaps a small example can shed some light on the concept.
Imaging you write a Python class like this one:
class SuperClass:
def method1(self):
self.method2()
def method2(self):
print(self.__class__.__name__)
If you ran this by
s = SuperClass()
s.method1()
It will print "SuperClass".
Now we create a subclass from SuperClass and override method2:
class SubClass(SuperClass):
def method2(self):
print(self.__class__.__name__)
and run it:
sub = SubClass()
sub.method1()
Now "SubClass" will be printed.
Still, we only call method1() as before. Inside method1() the method2() is called, but both are bound to the same reference (self in Python, this in Java). During sub-classing SuperClass method2() is changed, which means that an object of SubClass refers to a different version of this method.
That is open recursion.
In most cases, you override methods and call the overridden methods directly.
This scheme here is using an indirection over self-reference.
P.S.: I don't think this has been invented but discovered and then explained.
Open recursion allows to call another methods of object from within, through special variable like this or self.
In short, open recursion is about something actually not related to OOP, but more general.
The relation with OOP comes from the fact that many typical "OOP" PLs have such properties, but it is essentially not tied to any distinguishing features about OOP.
So there are different meanings, even in same "OOP" language. I will illustrate it later.
Etymology
As mentioned here, the terminology is likely coined in the famous TAPL by BCP, which illustrates the meaning by concrete OOP languages.
TAPL does not define "open recursion" formally. Instead, it points out the "special behavior of self (or this) is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first".
Nevertheless, neither of "open" and "recursion" comes from the OOP basis of a language. (Actually, it is also nothing to do with static types.) So the interpretation (or the informal definition, if any) in that source is overspecified in nature.
Ambiguity
The mentioning in TAPL clearly shows "recursion" is about "method invocation". However, it is not that simple in real languages, which usually do not have primitive semantic rules on the recursive invocation itself. Real languages (including the ones considered as OOP languages) usually specify the semantics of such invocation for the notation of the method calls. As syntactic devices, such calls are subject to the evaluation of some kind of expressions relying on the evaluations of its subexpressions. These evaluations imply the resolution of method name, under some independent rules. Specifically, such rules are about name resolution, i.e. to determine the denotation of a name (typically, a symbol, an identifier, or some "qualified" name expressions) in the subexpression. Name resolution often respects to scoping rules.
OTOH, the "late-bound" property emphasizes how to find the target implementation of the named method. This is a shortcut of evaluation of specific call expressions, but it is not general enough, because entities other than methods can also have such "special" behavior, even make such behavior not special at all.
A notable ambiguity comes from such insufficient treatment. That is, what does a "binding" mean. Traditionally, a binding can be modeled as a pair of a (scoped) name and its bound value, i.e. a variable binding. In the special treatment of "late-bound" ones, the set of allowed entities are smaller: methods instead of all named entities. Besides the considerably undermining the abstraction power of the language rules at meta level (in the language specification), it does not cease the necessity of traditional meaning of a binding (because there are other non-method entities), hence confusing. The use of a "late-bound" is at least an instance of bad naming. Instead of "binding", a more proper name would be "dispatching".
Worse, the use in TAPL directly mix the two meanings when dealing with "recusion". The "recursion" behavior is all about finding the entity denoted by some name, not just specific to method invocation (even in those OOP language).
The title of the chapter (Case Study: Imperative Objects) also suggests some inconsistency. Obviously, the so-called late binding of method invocation has nothing to do with imperative states, because the resolution of the dispatching does not require mutable metadata of invocation. (In some popular sense of implementation, the virtual method table need not to be modifiable.)
Openness
The use of "open" here looks like mimic to open (lambda) terms. An open term has some names not bound yet, so the reduction of such a term must do some name resolution (to compute the value of the expression), or the term is not normalized (never terminate in evaluation). There is no difference between "late" or "early" for the original calculi because they are pure, and they have the Church-Rosser property, so whether "late" or not does not alter the result (if it is normalized).
This is not the same in the language with potentially different paths of dispatching. Even that the implicit evaluation implied by the dispatching itself is pure, it is sensitive to the order among other evaluations with side effects which may have dependency on the concrete invocation target (for example, one overrider may mutate some global state while another can not). Of course in a strictly pure language there can be no observable differences even for any radically different invocation targets, a language rules all of them out is just useless.
Then there is another problem: why it is OOP-specific (as in TAPL)? Given that the openness is qualifying "binding" instead of "dispatching of method invocation", there are certainly other means to get the openness.
One notable instance is the evaluation of a procedure body in traditional Lisp dialects. There can be unbound symbols in the body and they are only resolved when the procedure being called (rather than being defined). Since Lisps are significant in PL history and the are close to lambda calculi, attributing "open" specifically to OOP languages (instead of Lisps) is more strange from the PL tradition. (This is also a case of "making them not special at all" mentioned above: every names in function bodies are just "open" by default.)
It is also arguable that the OOP style of self/this parameter is equivalent to the result of some closure conversion from the (implicit) environment in the procedure. It is questionable to treat such features primitive in the language semantics.
(It may be also worth noting, the special treatment of function calls from symbol resolution in other expressions is pioneered by Lisp-2 dialects, not any of typical OOP languages.)
More cases
As mentioned above, different meanings of "open recursion" may coexist in a same "OOP" language.
C++ is the first instance here, because there are sufficient reasons to make them coexist.
In C++, name resolution are all static, normatively name lookup. The rules of name lookup vary upon different scopes. Most of them are consistent with identifier lookup rules in C (except for the allowance of implicit declarations in C but not in C++): you must first declare the name, then the name can be lookup in the source code (lexically) later, otherwise the program is ill-formed (and it is required to issue an error in the implementation of the language). The strict requirement of such dependency of names are considerable "closed", because there are no later chance to recover from the error, so you cannot directly have names mutually referenced across different declarations.
To work around the limitation, there can be some additional declarations whose sole duty is to break the cyclic dependency. Such declarations are called "forward" declarations. Using of forward declarations still does not require "open" recursion, because every well-formed use must statically see the previous declaration of that name, so each name lookup does not require additional "late" binding.
However, C++ classes have special name lookup rules: some entities in the class scope can be referred in the context prior to their declaration. This makes mutual recursive use of name across different declarations possible without any additional "forward" declarations to break the cycle. This is exactly the "open recursion" in TAPL sense except that it is not about method invocation.
Moreover, C++ does have "open recursion" as per the descriptions in TAPL: this pointer and virtual functions. Rules to determine the target (overrider) of virtual functions are independent to the name lookup rules. A non-static member defined in a derived class generally just hide the entities with same name in the base classes. The dispatching rules kick in only on virtual function calls, after the name lookup (the order is guaranteed since evaulations of C++ function calls are strict, or applicative). It is also easy to introduce a base class name by using-declaration without worry about the type of the entity.
Such design can be seen as an instance of separate of concerns. The name lookup rules allows some generic static analysis in the language implementation without special treatment of function calls.
OTOH, Java have some more complex rules to mix up name lookup and other rules, including how to identify the overriders. Name shadowing in Java subclasses is specific to the kind of entities. It is more complicate to distinguish overriding with overloading/shadowing/hiding/obscuring for different kinds. There also cannot be techniques of C++'s using-declarations in the definition of subclasses. Such complexity does not make Java more or less "OOP" than C++, anyway.
Other consequences
Collapsing the bindings about name resolution and dispatching of method invocation leads to not only ambiguity, complexity and confusion, but also more difficulties on the meta level. Here meta means the fact that name binding can exposing properties not only available in the source language semantics, but also subject to the meta languages: either the formal semantic of the language or its implementation (say, the code to implement an interpreter or a compiler).
For example, as in traditional Lisps, binding-time can be distinguished from evaluation-time, because program properties revealed in binding-time (value binding in the immediate contexts) is more close to meta properties compared to evaluation-time properties (like the concrete value of arbitrary objects). An optimizing compiler can deploy the code generation immediately depending on the binding-time analysis either statically at the compile-time (when the body is to be evaluate more than once) or derferred at runtime (when the compilation is too expensive). There is no such option for languages blindly assume all resolutions in closed recursion faster than open ones (and even making them syntactically different at the very first). In such sense, OOP-specific open recursion is not just not handy as advertised in TAPL, but a premature optimization: giving up metacompilation too early, not in the language implementation, but in the language design.