Questions About AST Validation - kotlin

I’m writing a transpiler and came accross the topic of validating the input. I have some questions, but would also like to double-check if I understood everything correctly. To my understanding, there are 3 main validations you have to do when transpiling (or compiling) a programming language:
Syntax/grammar validation. This is done in my case by ANTLR which makes sure the input respects the BNF grammar.
Context validation. ANTLR only makes sure the input respects the grammar, but the grammar is context-free: for example the grammar of Java allows public, private, protected access modifiers on a class, but it will allow a class to have all 3 of them, it doesn’t know that a class should only have one of them. So this second validation makes sure that, for example, a class does not have more than one access modifier - I imagine I can do this as a visitor pattern on my AST, right?
Dependencies/references validation. Check that we have, for example, all the classes which are declared as import statements in the current compilation unit - this also seems fairly easy, but what do you do about method references/calls to 3rd party classes? Say, for example, your code calls a class from JDK – how do you check that a reference to that class is correct, do you need to also compile that class and add it to your AST?
For example, you can use java.util.List in Kotlin. How does the Kotlin compiler know to tell you if you are using a String instead of an Integer when calling List.get(int index)? Does the Kotlin compiler also compile the java.util.List interface?
Thank you for reading, any response is appreciated.

Related

Kotlin: Idiomatic usage of extension functions - putting extension functions next to the class it extends

I see some usages of Extension functions in Kotlin I don't personally think that makes sense, but it seems that there are some guidelines that "apparently" support it (a matter of interpretation).
Specifically: defining an extension function outside a class (but in the same file):
data class AddressDTO(val state: State,
val zipCode: String,
val city: String,
val streetAddress: String
)
fun AddressDTO.asXyzFormat() = "${streetAddress}\n${city}\n${state.name} $zipCode"
Where the asXyzFormat() is widely used, and cannot be defined as private/internal (but also for the cases it may be).
In my common sense, if you own the code (AddressDTO) and the usage is not local to some class / module (hence behing private/internal) - there is no reason to define an extension function - just define it as a member function of that class.
Edge case: if you want to avoid serialization of the function starting with get - annotate the class to get the desired behavior (e.g. #JsonIgnore on the function). This IMHO still doesn't justify an extension function.
The counter-response I got to this is that the approach of having an extension function of this fashion is supported by the Official Kotlin Coding Conventions. Specifically:
Use extension functions liberally. Every time you have a function that works primarily on an object, consider making it an extension function accepting that object as a receiver.
Source
And:
In particular, when defining extension functions for a class which are relevant for all clients of this class, put them in the same file where the class itself is defined. When defining extension functions that make sense only for a specific client, put them next to the code of that client. Do not create files just to hold "all extensions of Foo".
Source
I'll appreciate any commonly accepted source/reference explaining why it makes more sense to move the function to be a member of the class and/or pragmatic arguments support this separation.
That quote about using extension functions liberally, I'm pretty sure means use them liberally as opposed to top level non-extension functions (not as opposed to making it a member function). It's saying that if a top-level function conceptually works on a target object, prefer the extension function form.
I've searched before for the answer to why you might choose to make a function an extension function instead of a member function when working on a class you own the source code for, and have never found a canonical answer from JetBrains. Here are some reasons I think you might, but some are highly subject to opinion.
Sometimes you want a function that operates on a class with a specific generic type. Think of List<Int>.sum(), which is only available to a subset of Lists, but not a subtype of List.
Interfaces can be thought of as contracts. Functions that do something to an interface may make more sense conceptually since they are not part of the contract. I think this is the rationale for most of the standard library extension functions for Iterable and Sequence. A similar rationale might apply to a data class, if you think of a data class almost like a passive struct.
Extension functions afford the possibility of allowing users to pseudo-override them, but forcing them to do it in an independent way. Suppose your asXyzFormat() were an open member function. In some other module, you receive AddressDTO instances and want to get the XYZ format of them, exactly in the format you expect. But the AddressDTO you receive might have overridden asXyzFormat() and provide you something unexpected, so now you can't trust the function. If you use an extension function, than you allow users to replace asXyzFormat() in their own packages with something applicable to that space, but you can always trust the function asXyzFormat() in the source package.
Similarly for interfaces, a member function with default implementation invites users to override it. As the author of the interface, you may want a reliable function you can use on that interface with expected behavior. Although the end-user can hide your extension in their own module by overloading it, that will have no effect on your own uses of the function.
For what it's worth, I think it would be very rare to choose to make an extension function for a class (not an interface) when you own the source code for it. And I can't think of any examples of that in the standard library. Which leads me to believe that the Coding Conventions document is using the word "class" in a liberal sense that includes interfaces.
Here's a reverse argument…
One of the main reasons for adding extension functions to the language is being able to add functionality to classes from the standard library, and from third-party libraries and other dependencies where you don't control the code and can't add member functions (AKA methods).  I suspect it's mainly those cases that that section of the coding conventions is talking about.
In Java, the only option in this cases is utility methods: static methods, usually in a utility class gathering together lots of such methods, each taking the relevant object as its first parameter:
public static String[] splitOnChar(String str, char separator)
public static boolean isAllDigits(String str)
…and so on, interminably.
The main problem there is that such methods are hard to find (no help from the IDE unless you already know about all the various utility classes).  Also, calling them is long-winded (though it improved a bit once static imports were available).
Kotlin's extension methods are implemented exactly the same way down at the bytecode level, but their syntax is much simpler and exactly like member functions: they're written the same way (with this &c), calling them looks just like calling a member function, and your IDE will suggest them.
(Of course, they have drawbacks, too: no dynamic dispatch, no inheritance or overriding, scoping/import issues, name clashes, references to them are awkward, accessing them from Java or reflection is awkward, and so on.)
So: if the main purpose of extension functions is to substitute for member functions when member functions aren't possible, why would you use them when member functions are possible?!
(To be fair, there are a few reasons why you might want them.  For example, you can make the receiver nullable, which isn't possible with member functions.  But in most cases, they're greatly outweighed by the benefits of a proper member function.)
This means that the vast majority of extension functions are likely to be written for classes that you don't control the source code for, and so you don't have the option of putting them next to the class.

Why Kotlin blindly change internal classes into public in JVM?

As you know the private classes in Kotlin change to package-private under the hood and internals changed to the public.
unfortunately, this can lead to the known problem here.
if the compiler sees the usage of Kotlin internal classes when it wants to change it to the byte code, it can choose package-private for internal kotlin classes that didn't use outside of the package and choose public for others, so we can handle above problem on our own.
Or they can define another annotation such as #JvmPackagePrivate before internal classes to tell the compiler we want a package-private class in java.
Or they can do both.
The question is, why they don't solve this obvious problem with such an obvious solution?
Are they have another approach to solve this?
I just got acquainted with the Kotlin, so I think that I cant create lib for java with kotlin because when I create internal concrete classes, all client can see them outside of the library and its serious problem with kotlin. why they can't see this obvious problem??????
I want to mention that none of the answers in here solve this problem because of #JvmSynthetic and #JvmName just target the fun in kotlin, not classes and at the end they both visible even if they change the name of classes.
at last kotlin claims that it is completely interoperable with java but I think it's not right. better to say that it is 99 percent interoperable with java :)

Finding the Pharo documentation for the compile & evaluate methods, etc. in the compiler class

I've got an embarrassingly simple question here. I'm a smalltalk newbie (I attempt to dabble with it every 5 years or so), and I've got Pharo 6.1 running. How do I go about finding the official standard library documentation? Especially for the compiler class? Things like the compile and evaluate methods? I don't see how to perform a search with the Help Browser, and the method comments in the compiler class are fairly terse and cryptic. I also don't see an obvious link to the standard library API documentation at: http://pharo.org/documentation. The books "Pharo by Example" and "Deep into Pharo" don't appear to cover that class either. I imagine the class is probably similar for Squeak and other smalltalks, so a link to their documentation for the compiler class could be helpful as well.
Thanks!
There are several classes that collaborate in the compilation of a method (or expression) and, given your interest in the subject, I'm tempted to stimulate you even further in their study and understanding.
Generally speaking, the main classes are the Scanner, the Parser, the Compiler and the Encoder. Depending on the dialect these may have slightly different names and implementations but the central idea remains the same.
The Scanner parses the stream of characters of the source code and produces a stream of tokens. These tokens are then parsed by the Parser, which transforms them into the nodes of the AST (Abstract Syntax Tree). Then the Compiler visits the nodes of the AST to analyze them semantically. Here all variable nodes are classified: method arguments, method temporaries, shared, block arguments, block temporaries, etc. It is during this analysis where all variables get bound in their corresponding scope. At this point the AST is no longer "abstract" as it has been annotated with binding information. Finally, the nodes are revisited to generate the literal frame and bytecodes of the compiled method.
Of course, there are lots of things I'm omitting from this summary (pragmas, block closures, etc.) but with these basic ideas in mind you should now be ready to debug a very simple example. For instance, start with
Object compile: 'm ^3'
to internalize the process.
After some stepping into and over, you will reach the first interesting piece of code which is the method OpalCompiler >> #compile. If we remove the error handling blocks this methods speaks for itself:
compile
| cm |
ast := self parse.
self doSemanticAnalysis.
self callPlugins.
cm := ast generate: self compilationContext compiledMethodTrailer
^cm
First we have the #parse message where the parse nodes are created. Then we have the semantic analysis I mentioned above and finally #generate: produces the encoding. You should debug each of these methods to understand the compilation process in depth. Given that you are dealing with a tree be prepared to navigate thru a lot of visitors.
Once you become familiar with the main ideas you may want to try more elaborated -yet simple- examples to see other objects entering the scene.
Here are some simple facts:
Evaluation in Smalltalk is available everywhere: in workspaces, in
the Transcript, in Browsers, inspectors, the debugger, etc.
Basically, if you are allowed to edit text, most likely you will
also be allowed to evaluate it.
There are 4 evaluation commands
Do it (evaluates without showing the answer)
Print it (evaluates and prints the answer next to the expression)
Inspect it (evaluates and opens an inspector on the result)
Debug it (opens a debugger so you can evaluate your expression step by step).
Your expression can contain any literal (numbers, arrays, strings, characters, etc.)
17 "valid expression"
Your expression can contain any message.
3 + 4.
'Hello world' size.
1 bitShift: 28
Your expression can use any Global variable
Object new.
Smalltalk compiler
Your expression can reference self, super, true, nil, false.
SharedRandom globalGenerator next < 0.2 ifTrue: [nil] ifFalse: [self]
Your expression can use any variables declared in the context of the pane where you are writing. For example:
If you are writing in a class browser, self will be bound to the current class
If you are writing in an inspector, self is bound to the object under inspection. You can also use its instances variables in the expression.
If you are in the debugger, your expression can reference self, the instance variables, message arguments, temporaries, etc.
Finally, if you are in a workspace (a.k.a. Playground), you can use any temporaries there, which will be automatically created and remembered, without you having to declare them.
As near as I can tell, there is no API documentation for the Pharo standard library, like you find with other programming languages. This seems to be confirmed on the Pharo User's mailing list: http://forum.world.st/Essential-Documentation-td4916861.html
...there is a draft version of the ANSI standard available: http://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf
...but that doesn't seem to cover the compiler class.

What does `ParseTreeListener` mean in ANTLR?

I read the following statement about ParseTreeListener from the book < The Definitive ANTLR 4 Reference >:
ANTLR generates a ParseTreeListener subclass specific to each grammar
with enter and exit methods for each rule.
I am a bit confused about the each grammar notion. My undertanding is:
A language is equivalent to its grammar.
A grammar is just a set of rules.
A program is equivalent to a parse tree representing it.
So if we are working on a language application with ANTLR, there should be only one grammar. Thus there should be only one ParseTreeListener. So what does the each mean here?
ADD 1
As I read on, I have a feeling that the grammar here is merely specific to a *.g4 file. And maybe a language can have multiple *.g4 files. I am not sure if I am correct on this. I will keep updating this question.
after you define a .g4 grammar you can tell Antlr4 to generate a class that implements ParseTreeListener. In more detail, say you have a grammar Lang then Antlr4 generates an interface LangParserListener and a class LangParserBaseListener.
The interface defines all the enter- and exit-methods as mentioned above.
The class LangParserBaseListener gives you a default 'no-operation' implementation for each of the methods (note that there are two methods for each expression in Lang.g4 - so this could be a pretty large class/interface).
The main point of LangParserBaseListener is that it makes it easier to add a listener that only wants to 'listen' a small subset of the rules. For that simply inherit from it and override the respective methods.
And it does make perfectly sense to implement multiple listeners per grammar, e.g. a first pass to define all occurring symbols and a second pass to de-reference all symbolic references. This is also part of the reference textbook on Antlr4.
hope that helps
fricke

Check if a class implements an interface at run-time

Say FrameworkA consumes a FrameworkA.StandardLogger class for logging. I want to replace the logging library by another one (the SuperLogger class).
To make that possible, there are interfaces: FrameworkA will provide a FrameworkA.Logger interface that other libraries have to implement.
But what if other libraries don't implement that interface? FrameworkA might be a not popular enough framework to make SuperLogger care about its interface.
Possible solutions are:
have a standardized interface (defined by standards like JSR, PSR, ...)
write adapters
What if there is no standardized interface, and you want to avoid the pain of writing useless adapters if classes are compatible?
Couldn't there be another solution to ensure a class meets a contract, but at runtime?
Imagine (very simple implementation in pseudo-code):
namespace FrameworkA;
interface Logger {
void log(message);
}
namespace SuperLoggingLibrary;
class SupperLogger {
void log(message) {
// ...
}
}
SupperLogger is compatible with Logger if only it implemented Logger interface. But instead of having a "hard-dependency" to FrameworkA.Logger, its public "interface" (or signature) could be verified at runtime:
// Something verify that SupperLogger implements Logger at run-time
Logger logger = new SupperLogger();
// setLogger() expect Logger, all works
myFrameworkAConfiguration.setLogger(logger);
In the fake scenario, I expect the Logger logger = new SupperLogger() to fail at run-time if the class is not compatible with the interface, but to succeed if it is.
Would that be a valid thing in OOP? If yes, does it exist in any language? If no, why is it not valid?
My question stands for statically-typed languages (Java, ...) or dynamically typed languages (PHP, ...).
For PHP & al: I know when there is no type-check you can use any object you want even if it doesn't implement the interface, but I'd be interested in something that actually checks that the object complies with the interface.
This is called duck typing, a concept that you will find in Ruby ("it walks like a duck, it quacks like a duck, it must be a duck")
In other dynamically typed languages you can simulate it, for example in PHP with method_exists. In statically typed languages there might be workarounds with reflection, a search for "duck typing +language" will help to find them.
This is more of a statically typed issue than a OOP one. Both Java and Ruby are OO languages, but Java woudlnt allow what you want (as its statically typed) but Ruby would (as its dynamically typed).
From a statically typed language point of view one of the major (if not the major) advantage is knowing at compile time if an assignment is safe and valid. What you're looking for is provided by dynamically typed languages (such as Ruby), but isnt possible in a statically typed language - and this is by design (compile time safety).
You can, but it is ugly, do something like (in Java):
Object objLogger = new SupperLogger();
Logger logger = (Logger)objLogger;
This would pass at compile time but would fail at runtime if the assignment was invalid.
That said, the above is pretty ugly and isnt something I would do - it doesnt give you much and risks an unpleasant (and possibly suprising) exception at runtime.
I guess the best you could hope for in a language like Java would be to abstract the creation away from where you want to use it:
Logger logger = getLogger();
With the internals of getLogger deciding what to return. This however just defers the actual creation to further down - you'll still have to do so in a statically typed safe way.