Dynamic binding implementations in OO languages - oop

What are the two ways that are used to implement dynamic binding in OO languages. Like how is dynamic binding implemented for a pure OOL like Smalltalk versus mixed OOL such
as C++?

I don't know if there are two ways, and I don't know if it's pure vs. mixed OO languages.
Basically, C++ uses virtual table dispatch - each instance carries a table of functions (in case of C++, it carries a pointer to the table, but that doesn't play any role in the dispatch mechanism), and when you call x.my_method(), the compiler knows its eg. the second method of an object, so it emits code to jump to the second pointer in the virtual method table.
In dynamic languages (and ObjC), it's usually solved by having a dictionary of methods somewhere, where, on runtime, the name my_method is looked up and whatever function is found, is executed.
There are mixed approaches - in COM, you look up the interface by indentifier; then you execute a method from its vtable.
Sometimes, code like some switch statement (eg. switch depending on the type of an object) is generated to speed up the second approach.

Related

Why does COM interface return different values for same invoking method?

When I invoke a method on a COM interface that returns another interface, the punkVal is different each time.
Yet if I use the old punkVal's to invoke that interfaces methods, it too works. It seems like a lot of unnecessary objects(or probably pointers to objects) are being created but I need someway to determine if the returned interface is unique. All I know is that I invoke is returning an interface(punkVal) and the value is different every instance. The value pointed by that value is always the same, but I think it is because it points to the vtable or something, doesn't seem to be a reliable check. That, or even seemingly disparate interfaces are all actually the same interface.
To be clear:
someCOMInterface foo();
I call invoke on foo and expect punkVal to be someCOMInterface, which I must later on query to call it's methods using invoke. But each time I call the first invoke I get a different someCOMInterface(the "same" but "different" in that the value returned by invoke).
This isn't uncommon. It is entirely up to the developer of the COM library whether interface pointers returned from multiple calls to the same method will return the same pointer or not.
One of the reasons that different pointers may be returned is that the core object model used within a specific COM library may not be COM. I have, for example, written object models in C++ using things like shared_ptr, which arguably yields a better object model for C++ clients. But when I expose my C++ object model for interoperability (or, more generally, expose it as a DLL), COM is often a better choice. Since keeping an complex, hierarchical object model in sync with a set of wrapper classes can be difficult, wrapper objects may just be temporary -- created as necessary and destroyed whenever clients no longer use them.
In these circumstances, the client may still need to know that the objects are "equal" -- two different objects that wrap the same inner object can be considered "equal." To determine that, Microsoft defines the IObjectEquality interface. This interface may be implemented by COM wrapper classes so that a client can explicitly check if two non-equal pointers are conceptually "equal" objects. The objects you're using may or may not implement this interface. This blog post shows a complete example of determining object equality using this interface.
If IObjectEquality is not implemented, it is up to the developer of the COM object to provide some means of making such a determination, usually by providing some sort of Name or ID or other identifying property. For example, Excel's Application.Range property will return different pointers from subsequent calls with the same arguments. To determine if two ranges are equal, you can use the Range.Address method to get an "identifier" of that range, and then compare those identifiers.

objective-c: is method calling inside a class still with selectors or pure calls?

I am comparing different programming languages and development platforms. One important difference between objective-c and other languages is that it uses selectors, messages, so calling objc_msgSend every time, crossing the boundary of shared library, so introducing a measurable overhead, plus further overhead (little in the cached case) for the objc_msgSend internal functioning, like explained here:
http://www.mulle-kybernetik.com/artikel/Optimization/opti-3.html
I am not an expert so I have to trust this article, found on this forum.
On official Apple guide I find that "dot syntax" is just "syntactic sugar", that is the same method calling mechanism is used.
Question: I would like to know if, instead, pure calls are performed inside a class instance, where it would be a waste to call objc_msgSend. That is when one class instance's method is called from a same class instance's method.
Thanks
Consider this situation in a hypothetical OOP language:
class A {
say_something(){
print("A!")
}
do_something(){
say_something()
}
}
and
class B : extends A {
say_something(){
print("B!")
}
}
Now suppose you have an instance of B, and call do_something :
B b;
b.do_something();
Should b print A or B? If the call to say_something in the implementation of do_something goes through the vtable (in C++) or obj_msgSend (in Obj-C), it will print B, but if the compiler decides a call to a method in the same class should call to the method in the same class, it will print A.
Which is desirable depends on the situation. So, in C++, you can have a choice, by marking a member function virtual or not.
In Objective-C, every method is virtual.
So, even in C++, the optimization you suggested is not done to virtual methods. I won't say a language without virtual methods a proper OOP language.
Anyway, now you might worry it would degrade the performance of an Obj-C app. Apple knows the danger, so they have lots and lots of optimization. For example, in a recent runtime, objc_msgSend lives at a fixed address, so that you don't first need to go through the shared-library boundary.
The Mulle-Kybernetic article was also nice, but note also it's very dated: it's whopping ten years ago, which is in this business a pre-historic times. You can read about recent optimizations put in to the implementation of objc_msgSend in bbum's blog posts or Hamster's blog posts.
If you are writing codes managing UI, using the standard messaging (i.e. objc_msgSend behind the scenes) is perfectly adequate. It even works very smoothly on mobile devices! So, don't do a premature optimization by getting the IMP, etc., until it turns out to be absolutely necessary after measuring the performance of a code.
If you're writing a calculation-intensive code involving, say, tens of thousands of particles moving, yes I would advice against using the Obj-C messaging framework. In that case, just simply write a C code (which can always be used in Objective-C ) or C++ code (which can be used in Objective-C++.)
All objective-C method calls use obj_msgSend. This is because the code doesn't know the actual real class that is self. ie: it could be a subclass with another implementation, or even a proxy for a method that is implement by a remote machine via xmlrpc for example.
AFAIK the you can optimise things yourself by grabbing the implementation of a selector for a given object if you know you're going to send message X to object Y often.
it still uses dynamic dispatch via objc_msgSend and variants.
ideally, something similar to a JIT would be produced for objc types.
unlike objc, a c++ compiler can use static dispatch for calls to c++ types which are marked virtual when it can determine the type.
at any rate, there are many ways to optimize -- you could begin by implementing in c++, and then use objc where it's really handy... but i'm not sure what exactly you're trying to accomplish.

What is open recursion?

What is open recursion? Is it specific to OOP?
(I came across this term in this tweet by Daniel Spiewak.)
just copying http://www.comlab.ox.ac.uk/people/ralf.hinze/talks/Open.pdf:
"Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first. "
This paper analyzes the possibility of adding OO to ML, with regards to expressivity and complexity. It has the following excerpt on objects, which seems to make this term relatively clear –
3.3. Objects
The simplest form of object is just a record of functions that share a common closure environment that
carries the object state (we can call these simple objects). The function members of the record may or may not
be defined as mutually recursive. However, if one wants to support inheritance with overriding, the structure
of objects becomes more complicated. To enable open recursion, the call-graph of the method functions
cannot be hard-wired, but needs to be implemented indirectly, via object self-reference. Object self-reference
can be achieved either by construction, making each object a recursive, self-referential value (the fixed-point
model), or dynamically, by passing the object as an extra argument on each method call (the self-application
or self-passing model).5 In either case, we will call these self-referential objects.
The name "open recursion" is a bit misleading at first, because it has nothing to do with the recursion that normally is used (a function calling itself); and to that extent, there is no closed recursion.
It basically means, that a thing is referring to itself. I can only guess, but I do think that the term "open" comes from open as in "open for extension".
In that sense an object is open to extension, but still referring to itself.
Perhaps a small example can shed some light on the concept.
Imaging you write a Python class like this one:
class SuperClass:
def method1(self):
self.method2()
def method2(self):
print(self.__class__.__name__)
If you ran this by
s = SuperClass()
s.method1()
It will print "SuperClass".
Now we create a subclass from SuperClass and override method2:
class SubClass(SuperClass):
def method2(self):
print(self.__class__.__name__)
and run it:
sub = SubClass()
sub.method1()
Now "SubClass" will be printed.
Still, we only call method1() as before. Inside method1() the method2() is called, but both are bound to the same reference (self in Python, this in Java). During sub-classing SuperClass method2() is changed, which means that an object of SubClass refers to a different version of this method.
That is open recursion.
In most cases, you override methods and call the overridden methods directly.
This scheme here is using an indirection over self-reference.
P.S.: I don't think this has been invented but discovered and then explained.
Open recursion allows to call another methods of object from within, through special variable like this or self.
In short, open recursion is about something actually not related to OOP, but more general.
The relation with OOP comes from the fact that many typical "OOP" PLs have such properties, but it is essentially not tied to any distinguishing features about OOP.
So there are different meanings, even in same "OOP" language. I will illustrate it later.
Etymology
As mentioned here, the terminology is likely coined in the famous TAPL by BCP, which illustrates the meaning by concrete OOP languages.
TAPL does not define "open recursion" formally. Instead, it points out the "special behavior of self (or this) is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first".
Nevertheless, neither of "open" and "recursion" comes from the OOP basis of a language. (Actually, it is also nothing to do with static types.) So the interpretation (or the informal definition, if any) in that source is overspecified in nature.
Ambiguity
The mentioning in TAPL clearly shows "recursion" is about "method invocation". However, it is not that simple in real languages, which usually do not have primitive semantic rules on the recursive invocation itself. Real languages (including the ones considered as OOP languages) usually specify the semantics of such invocation for the notation of the method calls. As syntactic devices, such calls are subject to the evaluation of some kind of expressions relying on the evaluations of its subexpressions. These evaluations imply the resolution of method name, under some independent rules. Specifically, such rules are about name resolution, i.e. to determine the denotation of a name (typically, a symbol, an identifier, or some "qualified" name expressions) in the subexpression. Name resolution often respects to scoping rules.
OTOH, the "late-bound" property emphasizes how to find the target implementation of the named method. This is a shortcut of evaluation of specific call expressions, but it is not general enough, because entities other than methods can also have such "special" behavior, even make such behavior not special at all.
A notable ambiguity comes from such insufficient treatment. That is, what does a "binding" mean. Traditionally, a binding can be modeled as a pair of a (scoped) name and its bound value, i.e. a variable binding. In the special treatment of "late-bound" ones, the set of allowed entities are smaller: methods instead of all named entities. Besides the considerably undermining the abstraction power of the language rules at meta level (in the language specification), it does not cease the necessity of traditional meaning of a binding (because there are other non-method entities), hence confusing. The use of a "late-bound" is at least an instance of bad naming. Instead of "binding", a more proper name would be "dispatching".
Worse, the use in TAPL directly mix the two meanings when dealing with "recusion". The "recursion" behavior is all about finding the entity denoted by some name, not just specific to method invocation (even in those OOP language).
The title of the chapter (Case Study: Imperative Objects) also suggests some inconsistency. Obviously, the so-called late binding of method invocation has nothing to do with imperative states, because the resolution of the dispatching does not require mutable metadata of invocation. (In some popular sense of implementation, the virtual method table need not to be modifiable.)
Openness
The use of "open" here looks like mimic to open (lambda) terms. An open term has some names not bound yet, so the reduction of such a term must do some name resolution (to compute the value of the expression), or the term is not normalized (never terminate in evaluation). There is no difference between "late" or "early" for the original calculi because they are pure, and they have the Church-Rosser property, so whether "late" or not does not alter the result (if it is normalized).
This is not the same in the language with potentially different paths of dispatching. Even that the implicit evaluation implied by the dispatching itself is pure, it is sensitive to the order among other evaluations with side effects which may have dependency on the concrete invocation target (for example, one overrider may mutate some global state while another can not). Of course in a strictly pure language there can be no observable differences even for any radically different invocation targets, a language rules all of them out is just useless.
Then there is another problem: why it is OOP-specific (as in TAPL)? Given that the openness is qualifying "binding" instead of "dispatching of method invocation", there are certainly other means to get the openness.
One notable instance is the evaluation of a procedure body in traditional Lisp dialects. There can be unbound symbols in the body and they are only resolved when the procedure being called (rather than being defined). Since Lisps are significant in PL history and the are close to lambda calculi, attributing "open" specifically to OOP languages (instead of Lisps) is more strange from the PL tradition. (This is also a case of "making them not special at all" mentioned above: every names in function bodies are just "open" by default.)
It is also arguable that the OOP style of self/this parameter is equivalent to the result of some closure conversion from the (implicit) environment in the procedure. It is questionable to treat such features primitive in the language semantics.
(It may be also worth noting, the special treatment of function calls from symbol resolution in other expressions is pioneered by Lisp-2 dialects, not any of typical OOP languages.)
More cases
As mentioned above, different meanings of "open recursion" may coexist in a same "OOP" language.
C++ is the first instance here, because there are sufficient reasons to make them coexist.
In C++, name resolution are all static, normatively name lookup. The rules of name lookup vary upon different scopes. Most of them are consistent with identifier lookup rules in C (except for the allowance of implicit declarations in C but not in C++): you must first declare the name, then the name can be lookup in the source code (lexically) later, otherwise the program is ill-formed (and it is required to issue an error in the implementation of the language). The strict requirement of such dependency of names are considerable "closed", because there are no later chance to recover from the error, so you cannot directly have names mutually referenced across different declarations.
To work around the limitation, there can be some additional declarations whose sole duty is to break the cyclic dependency. Such declarations are called "forward" declarations. Using of forward declarations still does not require "open" recursion, because every well-formed use must statically see the previous declaration of that name, so each name lookup does not require additional "late" binding.
However, C++ classes have special name lookup rules: some entities in the class scope can be referred in the context prior to their declaration. This makes mutual recursive use of name across different declarations possible without any additional "forward" declarations to break the cycle. This is exactly the "open recursion" in TAPL sense except that it is not about method invocation.
Moreover, C++ does have "open recursion" as per the descriptions in TAPL: this pointer and virtual functions. Rules to determine the target (overrider) of virtual functions are independent to the name lookup rules. A non-static member defined in a derived class generally just hide the entities with same name in the base classes. The dispatching rules kick in only on virtual function calls, after the name lookup (the order is guaranteed since evaulations of C++ function calls are strict, or applicative). It is also easy to introduce a base class name by using-declaration without worry about the type of the entity.
Such design can be seen as an instance of separate of concerns. The name lookup rules allows some generic static analysis in the language implementation without special treatment of function calls.
OTOH, Java have some more complex rules to mix up name lookup and other rules, including how to identify the overriders. Name shadowing in Java subclasses is specific to the kind of entities. It is more complicate to distinguish overriding with overloading/shadowing/hiding/obscuring for different kinds. There also cannot be techniques of C++'s using-declarations in the definition of subclasses. Such complexity does not make Java more or less "OOP" than C++, anyway.
Other consequences
Collapsing the bindings about name resolution and dispatching of method invocation leads to not only ambiguity, complexity and confusion, but also more difficulties on the meta level. Here meta means the fact that name binding can exposing properties not only available in the source language semantics, but also subject to the meta languages: either the formal semantic of the language or its implementation (say, the code to implement an interpreter or a compiler).
For example, as in traditional Lisps, binding-time can be distinguished from evaluation-time, because program properties revealed in binding-time (value binding in the immediate contexts) is more close to meta properties compared to evaluation-time properties (like the concrete value of arbitrary objects). An optimizing compiler can deploy the code generation immediately depending on the binding-time analysis either statically at the compile-time (when the body is to be evaluate more than once) or derferred at runtime (when the compilation is too expensive). There is no such option for languages blindly assume all resolutions in closed recursion faster than open ones (and even making them syntactically different at the very first). In such sense, OOP-specific open recursion is not just not handy as advertised in TAPL, but a premature optimization: giving up metacompilation too early, not in the language implementation, but in the language design.

Is it good convention for a class to perform functions on itself?

I've always been taught that if you are doing something to an object, that should be an external thing, so one would Save(Class) rather than having the object save itself: Class.Save().
I've noticed that in the .Net libraries, it is common to have a class modify itself as with String.Format() or sort itself as with List.Sort().
My question is, in strict OOP is it appropriate to have a class which performs functions on itself when called to do so, or should such functions be external and called on an object of the class' type?
Great question. I have just recently reflected on a very similar issue and was eventually going to ask much the same thing here on SO.
In OOP textbooks, you sometimes see examples such as Dog.Bark(), or Person.SayHello(). I have come to the conclusion that those are bad examples. When you call those methods, you make a dog bark, or a person say hello. However, in the real world, you couldn't do this; a dog decides himself when it's going to bark. A person decides itself when it will say hello to someone. Therefore, these methods would more appropriately be modelled as events (where supported by the programming language).
You would e.g. have a function Attack(Dog), PlayWith(Dog), or Greet(Person) which would trigger the appropriate events.
Attack(dog) // triggers the Dog.Bark event
Greet(johnDoe) // triggers the Person.SaysHello event
As soon as you have more than one parameter, it won't be so easy deciding how to best write the code. Let's say I want to store a new item, say an integer, into a collection. There's many ways to formulate this; for example:
StoreInto(1, collection) // the "classic" procedural approach
1.StoreInto(collection) // possible in .NET with extension methods
Store(1).Into(collection) // possible by using state-keeping temporary objects
According to the thinking laid out above, the last variant would be the preferred one, because it doesn't force an object (the 1) to do something to itself. However, if you follow that programming style, it will soon become clear that this fluent interface-like code is quite verbose, and while it's easy to read, it can be tiring to write or even hard to remember the exact syntax.
P.S.: Concerning global functions: In the case of .NET (which you mentioned in your question), you don't have much choice, since the .NET languages do not provide for global functions. I think these would be technically possible with the CLI, but the languages disallow that feature. F# has global functions, but they can only be used from C# or VB.NET when they are packed into a module. I believe Java also doesn't have global functions.
I have come across scenarios where this lack is a pity (e.g. with fluent interface implementations). But generally, we're probably better off without global functions, as some developers might always fall back into old habits, and leave a procedural codebase for an OOP developer to maintain. Yikes.
Btw., in VB.NET, however, you can mimick global functions by using modules. Example:
Globals.vb:
Module Globals
Public Sub Save(ByVal obj As SomeClass)
...
End Sub
End Module
Demo.vb:
Imports Globals
...
Dim obj As SomeClass = ...
Save(obj)
I guess the answer is "It Depends"... for Persistence of an object I would side with having that behavior defined within a separate repository object. So with your Save() example I might have this:
repository.Save(class)
However with an Airplane object you may want the class to know how to fly with a method like so:
airplane.Fly()
This is one of the examples I've seen from Fowler about an aenemic data model. I don't think in this case you would want to have a separate service like this:
new airplaneService().Fly(airplane)
With static methods and extension methods it makes a ton of sense like in your List.Sort() example. So it depends on your usage pattens. You wouldn't want to have to new up an instance of a ListSorter class just to be able to sort a list like this:
new listSorter().Sort(list)
In strict OOP (Smalltalk or Ruby), all methods belong to an instance object or a class object. In "real" OOP (like C++ or C#), you will have static methods that essentially stand completely on their own.
Going back to strict OOP, I'm more familiar with Ruby, and Ruby has several "pairs" of methods that either return a modified copy or return the object in place -- a method ending with a ! indicates that the message modifies its receiver. For instance:
>> s = 'hello'
=> "hello"
>> s.reverse
=> "olleh"
>> s
=> "hello"
>> s.reverse!
=> "olleh"
>> s
=> "olleh"
The key is to find some middle ground between pure OOP and pure procedural that works for what you need to do. A Class should do only one thing (and do it well). Most of the time, that won't include saving itself to disk, but that doesn't mean Class shouldn't know how to serialize itself to a stream, for instance.
I'm not sure what distinction you seem to be drawing when you say "doing something to an object". In many if not most cases, the class itself is the best place to define its operations, as under "strict OOP" it is the only code that has access to internal state on which those operations depend (information hiding, encapsulation, ...).
That said, if you have an operation which applies to several otherwise unrelated types, then it might make sense for each type to expose an interface which lets the operation do most of the work in a more or less standard way. To tie it in to your example, several classes might implement an interface ISaveable which exposes a Save method on each. Individual Save methods take advantage of their access to internal class state, but given a collection of ISaveable instances, some external code could define an operation for saving them to a custom store of some kind without having to know the messy details.
It depends on what information is needed to do the work. If the work is unrelated to the class (mostly equivalently, can be made to work on virtually any class with a common interface), for example, std::sort, then make it a free function. If it must know the internals, make it a member function.
Edit: Another important consideration is performance. In-place sorting, for example, can be miles faster than returning a new, sorted, copy. This is why quicksort is faster than merge sort in the vast majority of cases, even though merge sort is theoretically faster, which is because quicksort can be performed in-place, whereas I've never heard of an in-place merge-sort. Just because it's technically possible to perform an operation within the class's public interface, doesn't mean that you actually should.

How is a dynamically-typed language implemented on top of a statically-typed language?

I've only recently come to really grasp the difference between static and dynamic typing, by starting off with C++, and moving into Python and JavaScript. What I don't understand is how a dynamically-typed language (e.g. Python) can be implemented on top of a statically-typed language (e.g. C). I seem to remember reading something about void pointers once, but I didn't really get it.
Every variable in the d-t language is represented as a struct { type, value }, where a value is union/another struct/pointer etc.
In C++ you can get similar ("similar") result if you, for example, create a base abstract class MyVariable and derived MyInt, MyString etc. You can, with some more work, use these vars like in dynamically typed language. (I don't know C++ very well, but I think you'll need to use friend operators functions to change a type of variables in runtime, or maybe not, whatever)
This result is archieved by the same thing, runtime type information, which strores info of actual type in the object
I won't recommend it, though :)
Basically, each "variable" of your dynamically typed language is represented by a structure in the statically typed language, which the data type being one of the fields. The operations on these dynamic data types (add, subtract, compare) are usually implemented by a virtual method table, which is for each data type a number of pointers to functions that implement the desired functionality in a type-specific way.
It's not. The dynamically typed language is implemented on top of a CPU architecture. As long as the CPU architecture is Turing complete, you can implement a static language on it, or a dynamic language, or something hybrid like the CLR/DLR of .NET. The important thing is that the Turing completeness of the CPU architecture is what enables or disables things, not the static nature of a programming language like C or C++.
In general, programming languages maintain Turing completeness, and therefore you can implement anything in any programming language. Of course some things are easier if the underlying tools support it, so it is not easy to implement an application that relies on a dynamic underpinning, in C or C++. That's why people put the effort into making a dynamic system that is programmable, like Python, so that you can implement the dynamic system once and suffer going through that extra effort only one time, then reuse it from the dynamic language layer.