Java inherited field shadowing and the JVM? - jvm

Is the mechanism how fields are shadowed/hidden by inheritance and later resolved part of the JVM spec? I know it is part of the Java spec, and can be found in many blog posts and SO questions. However, when I actually look at the JVM spec for field resolution, the words "hiding" or "shadowing" do not appear anywhere in the pdf of the JVM spec.
I ask this because I am writing my own JVM, and discovered that this field shadowing is a property of the bytecode/VM and not just part of the Java compiler or Java-the-language. I want to know the proper, authoritative way this should be implemented at the VM level. Surely a (mis?)feature of the JVM this important needs to be formally documented somewhere?

The term shadowing usually refers to when one identifer shadows another. I.e a given name could refer to multiple variables, so there has to be a mechanism to disambiguate it. This is mostly a language level construct because it contains a lot more names. Local variable names don't appear in the bytecode at all except as optional debugging information for instance.
From the bytecode point of view, you already have an explicit reference to a class, name, and descriptor. The only question is whether the field you're describing is actually declared in the class you specified, or whether it was inherited from one of it's superclasses.
As you already discovered, Field Resolution is explained in section 5.4.3.2 of the standard. The terms hiding and shadowing are not used because they apply to source code, not classfiles.

Related

Type pool or class of constants?

What is the difference between Type-pool and creating a class for constants?
What is better?
My question is for a large group of constants and to be accessible to other groups.
Thank you
EDIT - Thank you for the answers and I will improve my question. I need something to store constants and I will use them on programs or other classes. Basically, I wanted to know if it is better to use a type-pool or a class with constants (only). I can have more than one class or type-pool.
The documentation mentions this:
Since it is possible to also define data types and constants in the public visibility section of global classes, type groups are obsolete and should no longer be created. Existing type groups can still be used.
A sensibly named interface with the constants you desire is the way to go. An additional benefit is that ABAP OO enforces some more rules.
Agree with #petul's answer, except for one detail: I'd recommend creating one enumeration-like class per logical group of constants, instead of collecting constants in interfaces.
Consider using the new enum language feature for specifying the constant values.
Interfaces can be accidentally "implemented", which doesn't make sense here. Classes can prevent this with final.
Making one class per logical group simplifies finding the constants with IDE features such as Ctrl+Shift+A search in the ABAP Development Tools. Constants that are randomly thrown together into interfaces are hard to find later on.
Classes allow adding enumeration-like helper methods like converters, existence checks, numbering all values.
Classes also allow adding unit tests, such as ensuring that the constant collection is still in sync with the fixed values of an underlying domain.

extending objects at run-time via categories?

Objective-C’s objects are pretty flexible when compared to similar languages like C++ and can be extended at runtime via Categories or through runtime functions.
Any idea what this sentence means? I am relatively new to Objective-C
While technically true, it may be confusing to the reader to call category extension "at runtime." As Justin Meiners explains, categories allow you to add additional methods to an existing class without requiring access to the existing class's source code. The use of categories is fairly common in Objective-C, though there are some dangers. If two different categories add the same method to the same class, then the behavior is undefined. Since you cannot know whether some other part of the system (perhaps even a system library) adds a category method, you typically must add a prefix to prevent collisions (for example rather than swappedString, a better name would likely be something like rnc_swappedString if this were part of RNCryptor for instance.)
As I said, it is technically true that categories are added at runtime, but from the programmer's point of view, categories are written as though just part of the class, so most people think of them as being a compile-time choice. It is very rare to decide at runtime whether to add a category method or not.
As a beginner, you should be aware of categories, but slow to create new ones. Creating categories is a somewhat intermediate-level skill. It's not something to avoid, but not something you'll use every day. It's very easy to overuse them. See Justin's link for more information.
On the other hand, "runtime functions" really do add new functionality to existing classes or even specific objects at runtime, and are completely under the control of code. You can, at runtime, modify a class such that it responds to a method it didn't previously respond to. You can even generate entirely new classes at runtime that did not exist when the program was compiled, and you can change the class of existing objects. (This is exactly how Key-Value Observation is implemented.)
Modifying classes and objects using the runtime is an advanced skill. You should not even consider using these techniques in production code until you have significant experience. And when you have that experience, it will tell you that you very seldom what to do this anyway. You will know the runtime functions because they are C-based, with names like method_exchangeImplmentations. You won't mistake them for normal ObjC (and you generally have to import objc/runtime.h to get to them.)
There is a middle-ground that bleeds into runtime manipulation called message forwarding and dynamic message resolution. This is often used for proxy objects, and is implemented with -forwardingTargetForSelector, +resolveInstanceMethod, and some similar methods. These are tools that allow classes to modify themselves at runtime, and is much less dangerous than modifying other classes (i.e. "swizzling").
It's also important to consider how all of this translates to Swift. In general, Swift has discouraged and restricted the use of runtime class manipulation, but it embraces (and improves) category-like extensions. By the time you're experienced enough to dig into the runtime, you will likely find it an even more obscure skill than it is today. But you will use extensions (Swift's version of categories) in every program.
A category allows you to add functionality to an existing class that you do not have access to source code for (System frameworks, 3rd party APIs etc). This functionality is possible by adding methods to a class at runtime.
For example lets say I wanted to add a method to NSString that swapped uppercase and lowercase letters called -swappedString. In static languages (such as C++), extending classes like this is more difficult. I would have to create a subclass of NSString (or a helper function). While my own code could take advantage of my subclass, any instance created in a library would not use my subclass and would not have my method.
Using categories I can extend any class, such as adding a -swappedString method and use it on any instance of the class, such asNSString transparently [anyString swappedString];.
You can learn more details from Apple's Docs

Difference between INCLUDE and modules in Fortran

What are the practical differences between using modules with the use statement or isolated files with the include statement? I mean, if I have a subroutine that is used a lot throughout a program: when or why should I put it inside a module or just write it in a separate file and include it in every other part of the program where it needs to be used?
Also, would it be good practice to write all subroutines intended to go in a module in separate files and use include inside the module? Specially if the code in the subroutines is long, so as to keep the code better organized (that way all subroutines are packed in the mod, but if I have to edit one I don't need to go though a maze of code).
The conceptual differences between the two map through to very significant practical differences.
An INCLUDE line operates at the source level - it accomplishes simple ("dumb") text inclusion. In the absence of any special processor interpretation of the "filename" (no requirement for that actually to be a file) in the include line the complete source could quite easily be manually spliced together by the programmer and fed to the compiler with no difference what-so-ever in the semantics of the source. Included source has no real interpretation in isolation - its meaning is completely dependent on the context in which the include line that references the included source appears.
Modules operate at the much higher entity level of the program, i.e. at the level where the compiler is considering the things that the source actually describes. A module can be compiled in isolation of its downstream users and once it has been compiled the compiler knows exactly what things the module can provide to the program.
Typically what someone using include lines is hoping to do is what modules were actually designed to do.
Example issues:
Because entity declarations can be spread over multiple statements the entities described by included source might not be what you expect. Consider the following source to be included:
INTEGER :: i
In isolation it looks like this declares the name i as an integer scalar (or perhaps a function? Who knows!). Now consider the following scope that includes the above:
INCLUDE "source from above"
DIMENSION :: i(10,10)
i is now a rank two array! Perhaps you want to make it a POINTER? An ALLOCATABLE? A dummy argument? Perhaps that results in an error, or perhaps it is valid source! Throw implicit typing into the mix to really compound the potential fun.
An entity defined in a module is "completely" defined by the module. Attributes that are specific to the scope of use can be changed (VOLATILE, accessibility, etc), but the fundamental entity remains the same. Name clashes are explicitly called out and can be easily worked around with a rename clause on the USE statement.
Fortran has restrictions on statement ordering (specification statements must go before executable statements, etc.). Included source is also subject to those restrictions, again in the context of the point of inclusion, not the point of source definition.
Mix well with source ambiguity between statement function definitions (specification part) and assignment statements (executable part) for some completely obtuse error messages or, worse, silent acceptance by the compiler of erroneous code.
There are requirements on where the USE statement that references a module appears, but the source for the actual module program unit is completely independent of its point of use.
Fancy having some global state to be shared across related procedures and you want to use include? Let me introduce you to common blocks and the associated underlying concept of sequence association...
Sequence association is a unfortunate bleed-through of early underlying Fortran processor implementation that is an error prone, inflexible, anti-optimisation anachronism.
Module variables make common blocks and their associated evils completely unnecessary.
If you were using include lines, then note that you don't actually include the source of a commonly used procedure (the suggestion in your first paragraph is just going to result in a morass of syntax errors from the compiler). What you would typically do is include source that describes the interface of the procedure. For any non-trivial procedure the source that describes the interface is different from the complete source of the procedure - implying that you now need to maintain two source representations of the same thing. This is an error prone maintenance burden.
As mentioned - the compilers automatically gains knowledge of the interface of a module procedure (the compiler knowledge is "explicit" because it actually saw the procedure's code - hence the term "explicit interface"). No need for the programmer to do anything more.
A consequence of the above is that external subprograms should not be used at all unless there are very good reasons to the contrary (perhaps the existence of circular or excessively extensive dependencies) - the basic starting point should be to put everything in a module or main program.
Other posters have mentioned the source code organisation benefits of modules - including the ability to group related procedures and other "stuff" into the one package, with control over accessibility of internal implementation details.
I accept there is a valid use of INCLUDE lines as per the second paragraph of the question - where large modules become unwieldy in size. F2008 has addressed this with submodules, which also bring a number of other benefits. Once they become widely supported the include line work-around should be abandoned.
A second valid use is to overcome a lack of support by the language for generic programming techniques (what templates provide in C++) - i.e. where the types of objects involved in an operation may vary, but the token sequence that describes what to do on those objects is essentially the same. It might be another decade or so before the language sorts that out.
Placing procedures into modules and using those modules makes the interface of the procedure explicit. It allows a Fortran compiler to check for consistency between the actual arguments in a call and the dummy arguments of the procedure. This guards against a variety of programmer mistakes. An explicit interface is also necessary for certain "advanced" features of Fortran >=90; for example, optional or keyword arguments. Without the explicit interface, the compiler won't generate the correct call. Merely including a file doesn't provide these advantages.
M.S.B.'s answer is great and is probably the most important reason to prefer modules over include. I'd like to add a few more thoughts.
Using modules reduces your compiled binary size if that is something that is important to you. A module is compiled once, and when you use it you are symbolically loading that module to use the code. When you include a file, you are actually inserting the new code into your routine. If you use include a lot it can cause your binary to be large and also increase your compile time.
You can also use modules to fake OOP style coding in Fortran 90 through clever use of public and private functions and user defined types in a module. Even if you didn't want to do that, it provides a nice way to group functions that logically belong together.

What is open recursion?

What is open recursion? Is it specific to OOP?
(I came across this term in this tweet by Daniel Spiewak.)
just copying http://www.comlab.ox.ac.uk/people/ralf.hinze/talks/Open.pdf:
"Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first. "
This paper analyzes the possibility of adding OO to ML, with regards to expressivity and complexity. It has the following excerpt on objects, which seems to make this term relatively clear –
3.3. Objects
The simplest form of object is just a record of functions that share a common closure environment that
carries the object state (we can call these simple objects). The function members of the record may or may not
be defined as mutually recursive. However, if one wants to support inheritance with overriding, the structure
of objects becomes more complicated. To enable open recursion, the call-graph of the method functions
cannot be hard-wired, but needs to be implemented indirectly, via object self-reference. Object self-reference
can be achieved either by construction, making each object a recursive, self-referential value (the fixed-point
model), or dynamically, by passing the object as an extra argument on each method call (the self-application
or self-passing model).5 In either case, we will call these self-referential objects.
The name "open recursion" is a bit misleading at first, because it has nothing to do with the recursion that normally is used (a function calling itself); and to that extent, there is no closed recursion.
It basically means, that a thing is referring to itself. I can only guess, but I do think that the term "open" comes from open as in "open for extension".
In that sense an object is open to extension, but still referring to itself.
Perhaps a small example can shed some light on the concept.
Imaging you write a Python class like this one:
class SuperClass:
def method1(self):
self.method2()
def method2(self):
print(self.__class__.__name__)
If you ran this by
s = SuperClass()
s.method1()
It will print "SuperClass".
Now we create a subclass from SuperClass and override method2:
class SubClass(SuperClass):
def method2(self):
print(self.__class__.__name__)
and run it:
sub = SubClass()
sub.method1()
Now "SubClass" will be printed.
Still, we only call method1() as before. Inside method1() the method2() is called, but both are bound to the same reference (self in Python, this in Java). During sub-classing SuperClass method2() is changed, which means that an object of SubClass refers to a different version of this method.
That is open recursion.
In most cases, you override methods and call the overridden methods directly.
This scheme here is using an indirection over self-reference.
P.S.: I don't think this has been invented but discovered and then explained.
Open recursion allows to call another methods of object from within, through special variable like this or self.
In short, open recursion is about something actually not related to OOP, but more general.
The relation with OOP comes from the fact that many typical "OOP" PLs have such properties, but it is essentially not tied to any distinguishing features about OOP.
So there are different meanings, even in same "OOP" language. I will illustrate it later.
Etymology
As mentioned here, the terminology is likely coined in the famous TAPL by BCP, which illustrates the meaning by concrete OOP languages.
TAPL does not define "open recursion" formally. Instead, it points out the "special behavior of self (or this) is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first".
Nevertheless, neither of "open" and "recursion" comes from the OOP basis of a language. (Actually, it is also nothing to do with static types.) So the interpretation (or the informal definition, if any) in that source is overspecified in nature.
Ambiguity
The mentioning in TAPL clearly shows "recursion" is about "method invocation". However, it is not that simple in real languages, which usually do not have primitive semantic rules on the recursive invocation itself. Real languages (including the ones considered as OOP languages) usually specify the semantics of such invocation for the notation of the method calls. As syntactic devices, such calls are subject to the evaluation of some kind of expressions relying on the evaluations of its subexpressions. These evaluations imply the resolution of method name, under some independent rules. Specifically, such rules are about name resolution, i.e. to determine the denotation of a name (typically, a symbol, an identifier, or some "qualified" name expressions) in the subexpression. Name resolution often respects to scoping rules.
OTOH, the "late-bound" property emphasizes how to find the target implementation of the named method. This is a shortcut of evaluation of specific call expressions, but it is not general enough, because entities other than methods can also have such "special" behavior, even make such behavior not special at all.
A notable ambiguity comes from such insufficient treatment. That is, what does a "binding" mean. Traditionally, a binding can be modeled as a pair of a (scoped) name and its bound value, i.e. a variable binding. In the special treatment of "late-bound" ones, the set of allowed entities are smaller: methods instead of all named entities. Besides the considerably undermining the abstraction power of the language rules at meta level (in the language specification), it does not cease the necessity of traditional meaning of a binding (because there are other non-method entities), hence confusing. The use of a "late-bound" is at least an instance of bad naming. Instead of "binding", a more proper name would be "dispatching".
Worse, the use in TAPL directly mix the two meanings when dealing with "recusion". The "recursion" behavior is all about finding the entity denoted by some name, not just specific to method invocation (even in those OOP language).
The title of the chapter (Case Study: Imperative Objects) also suggests some inconsistency. Obviously, the so-called late binding of method invocation has nothing to do with imperative states, because the resolution of the dispatching does not require mutable metadata of invocation. (In some popular sense of implementation, the virtual method table need not to be modifiable.)
Openness
The use of "open" here looks like mimic to open (lambda) terms. An open term has some names not bound yet, so the reduction of such a term must do some name resolution (to compute the value of the expression), or the term is not normalized (never terminate in evaluation). There is no difference between "late" or "early" for the original calculi because they are pure, and they have the Church-Rosser property, so whether "late" or not does not alter the result (if it is normalized).
This is not the same in the language with potentially different paths of dispatching. Even that the implicit evaluation implied by the dispatching itself is pure, it is sensitive to the order among other evaluations with side effects which may have dependency on the concrete invocation target (for example, one overrider may mutate some global state while another can not). Of course in a strictly pure language there can be no observable differences even for any radically different invocation targets, a language rules all of them out is just useless.
Then there is another problem: why it is OOP-specific (as in TAPL)? Given that the openness is qualifying "binding" instead of "dispatching of method invocation", there are certainly other means to get the openness.
One notable instance is the evaluation of a procedure body in traditional Lisp dialects. There can be unbound symbols in the body and they are only resolved when the procedure being called (rather than being defined). Since Lisps are significant in PL history and the are close to lambda calculi, attributing "open" specifically to OOP languages (instead of Lisps) is more strange from the PL tradition. (This is also a case of "making them not special at all" mentioned above: every names in function bodies are just "open" by default.)
It is also arguable that the OOP style of self/this parameter is equivalent to the result of some closure conversion from the (implicit) environment in the procedure. It is questionable to treat such features primitive in the language semantics.
(It may be also worth noting, the special treatment of function calls from symbol resolution in other expressions is pioneered by Lisp-2 dialects, not any of typical OOP languages.)
More cases
As mentioned above, different meanings of "open recursion" may coexist in a same "OOP" language.
C++ is the first instance here, because there are sufficient reasons to make them coexist.
In C++, name resolution are all static, normatively name lookup. The rules of name lookup vary upon different scopes. Most of them are consistent with identifier lookup rules in C (except for the allowance of implicit declarations in C but not in C++): you must first declare the name, then the name can be lookup in the source code (lexically) later, otherwise the program is ill-formed (and it is required to issue an error in the implementation of the language). The strict requirement of such dependency of names are considerable "closed", because there are no later chance to recover from the error, so you cannot directly have names mutually referenced across different declarations.
To work around the limitation, there can be some additional declarations whose sole duty is to break the cyclic dependency. Such declarations are called "forward" declarations. Using of forward declarations still does not require "open" recursion, because every well-formed use must statically see the previous declaration of that name, so each name lookup does not require additional "late" binding.
However, C++ classes have special name lookup rules: some entities in the class scope can be referred in the context prior to their declaration. This makes mutual recursive use of name across different declarations possible without any additional "forward" declarations to break the cycle. This is exactly the "open recursion" in TAPL sense except that it is not about method invocation.
Moreover, C++ does have "open recursion" as per the descriptions in TAPL: this pointer and virtual functions. Rules to determine the target (overrider) of virtual functions are independent to the name lookup rules. A non-static member defined in a derived class generally just hide the entities with same name in the base classes. The dispatching rules kick in only on virtual function calls, after the name lookup (the order is guaranteed since evaulations of C++ function calls are strict, or applicative). It is also easy to introduce a base class name by using-declaration without worry about the type of the entity.
Such design can be seen as an instance of separate of concerns. The name lookup rules allows some generic static analysis in the language implementation without special treatment of function calls.
OTOH, Java have some more complex rules to mix up name lookup and other rules, including how to identify the overriders. Name shadowing in Java subclasses is specific to the kind of entities. It is more complicate to distinguish overriding with overloading/shadowing/hiding/obscuring for different kinds. There also cannot be techniques of C++'s using-declarations in the definition of subclasses. Such complexity does not make Java more or less "OOP" than C++, anyway.
Other consequences
Collapsing the bindings about name resolution and dispatching of method invocation leads to not only ambiguity, complexity and confusion, but also more difficulties on the meta level. Here meta means the fact that name binding can exposing properties not only available in the source language semantics, but also subject to the meta languages: either the formal semantic of the language or its implementation (say, the code to implement an interpreter or a compiler).
For example, as in traditional Lisps, binding-time can be distinguished from evaluation-time, because program properties revealed in binding-time (value binding in the immediate contexts) is more close to meta properties compared to evaluation-time properties (like the concrete value of arbitrary objects). An optimizing compiler can deploy the code generation immediately depending on the binding-time analysis either statically at the compile-time (when the body is to be evaluate more than once) or derferred at runtime (when the compilation is too expensive). There is no such option for languages blindly assume all resolutions in closed recursion faster than open ones (and even making them syntactically different at the very first). In such sense, OOP-specific open recursion is not just not handy as advertised in TAPL, but a premature optimization: giving up metacompilation too early, not in the language implementation, but in the language design.

OOP vs procedural in run-time

I have very simple question I cant find answer anywhere on the internet.
So, my question is, in procedural programming, code is in code section, which goes into Read Only memory area. Variables are either on stack or heap.
But OOP says that object are created in memory. So, does it mean even functions are written into R/W memory area?
And, does Os have to have some inbuilt OOP programs support? For example if OS doesent allowed to read instruction outside Read only code section. Thanks.
Generally, both OOP and procedural programming are abstractions which exist only at the source-code level. Once a program is compiled into executable machine-code, these abstractions cease to exist. So whether or not a particular language is OOP or procedural has no bearing on what regions of memory it uses, or where instructions are placed during execution.
The OS itself usually doesn't know or care whether a particular executable was written in an OOP or procedural language. It only cares that the executable uses binary op-codes compatible with its native instruction set, and that the executable has an ABI (binary interface) that it understands.
This is a good question.
Whereas as object constitutes functions and data as being placed in the same spot theoretically, most implementations split it. The way you do it, is that code is split out and stored into the RO segment. An object in the RW area then have a way to refer back to that code in the RO area. The coupling of code and data is only used conceptually by the human programmer and the type checker to ensure that you do not violate the rules and principles.
A Java/C#-like language will usually be made such that each object has a tag identifying the type of the object. The object itself is simply a struct containing all the fields laid out in a prespecified order. This tag can then be used to look up which function in the RO-area to call. The function in the RO-area is altered to take an extra parameter, called this or self through which the contents of said object can be reached. When the method needs to refer to fields, it knows the pre-specified order, so it can do that correclty. Note that there are some tricks needed to solve inheritance, but this is the crux of the idea.
A Python/Ruby-like language will usually make an object be a hash-table where a method is a pointer to the code in the RO-area (provided that the language is compiled and not run through a bytecode interpreter). Function calls are made by looking up the hash-table contents and following the code pointer. Fields are also looked up in the same hash table.
With those basics down, most implementations make tricks to avoid the part where a pointer is followed to find the function to call. They try to figure out and narrow down the possible call to a single function. Then they can replace the lookup with a direct call to the right function, a much faster solution.
the tl;dr version: The language semantics views fields and methods as part of an object. The implementation split them into RO and RW segments. As such no OS support is needed.
OOP doesn't say this. I have no idea where you read it, if you add a quote that would help.
Objects are variables, so what you know about variables is correct for objects. In languages like C# (.net framework actually) objects can only be stored in heap, because they are so called reference types. In C++ they can live anywhere.
But OOP says that object are created in memory. So, does it mean even functions are written into R/W memory area?
From this i concluded that you think that functions are objects. That is true in far not every OOP language. It is from functional languages where functions are first class objects. Functions are in majority of cases immutable and are placed in read only sections.
Common OSes like Windows, Linux and MacOsx are unaware of objects. This is purely program concept. .net framework and java vm provide layer of abstraction. They are execution environments that have build in object support.