Is there way to provide wrapper of ParseTree to antlr4 parser? - antlr

Is there way to provide wrapper of ParseTree to antlr4 parser?
In antlr2 I could set custom AST node type :
parser.setASTNodeClass(DetailAST.class.getName());
I know that in antlr4 there is no AST, but I want to add some functionality to all nodes in parse tree, for example: getNextSibling, getPreviousSibling, getType, getLine, getColumn, etc.
I don't want cast every node to YYYContext to work with it. Instead of this I want to create basic class for all nodes in parse tree that will have all these methods.

As of ANTLR 4.4, there is no way to override the types used for the parse tree. An issue does exist for discussions on the subject, but (as of today) the feature has not been implemented or even assigned to a target release milestone.
https://github.com/antlr/antlr4/issues/30

Related

Questions About AST Validation

I’m writing a transpiler and came accross the topic of validating the input. I have some questions, but would also like to double-check if I understood everything correctly. To my understanding, there are 3 main validations you have to do when transpiling (or compiling) a programming language:
Syntax/grammar validation. This is done in my case by ANTLR which makes sure the input respects the BNF grammar.
Context validation. ANTLR only makes sure the input respects the grammar, but the grammar is context-free: for example the grammar of Java allows public, private, protected access modifiers on a class, but it will allow a class to have all 3 of them, it doesn’t know that a class should only have one of them. So this second validation makes sure that, for example, a class does not have more than one access modifier - I imagine I can do this as a visitor pattern on my AST, right?
Dependencies/references validation. Check that we have, for example, all the classes which are declared as import statements in the current compilation unit - this also seems fairly easy, but what do you do about method references/calls to 3rd party classes? Say, for example, your code calls a class from JDK – how do you check that a reference to that class is correct, do you need to also compile that class and add it to your AST?
For example, you can use java.util.List in Kotlin. How does the Kotlin compiler know to tell you if you are using a String instead of an Integer when calling List.get(int index)? Does the Kotlin compiler also compile the java.util.List interface?
Thank you for reading, any response is appreciated.

Finding the Pharo documentation for the compile & evaluate methods, etc. in the compiler class

I've got an embarrassingly simple question here. I'm a smalltalk newbie (I attempt to dabble with it every 5 years or so), and I've got Pharo 6.1 running. How do I go about finding the official standard library documentation? Especially for the compiler class? Things like the compile and evaluate methods? I don't see how to perform a search with the Help Browser, and the method comments in the compiler class are fairly terse and cryptic. I also don't see an obvious link to the standard library API documentation at: http://pharo.org/documentation. The books "Pharo by Example" and "Deep into Pharo" don't appear to cover that class either. I imagine the class is probably similar for Squeak and other smalltalks, so a link to their documentation for the compiler class could be helpful as well.
Thanks!
There are several classes that collaborate in the compilation of a method (or expression) and, given your interest in the subject, I'm tempted to stimulate you even further in their study and understanding.
Generally speaking, the main classes are the Scanner, the Parser, the Compiler and the Encoder. Depending on the dialect these may have slightly different names and implementations but the central idea remains the same.
The Scanner parses the stream of characters of the source code and produces a stream of tokens. These tokens are then parsed by the Parser, which transforms them into the nodes of the AST (Abstract Syntax Tree). Then the Compiler visits the nodes of the AST to analyze them semantically. Here all variable nodes are classified: method arguments, method temporaries, shared, block arguments, block temporaries, etc. It is during this analysis where all variables get bound in their corresponding scope. At this point the AST is no longer "abstract" as it has been annotated with binding information. Finally, the nodes are revisited to generate the literal frame and bytecodes of the compiled method.
Of course, there are lots of things I'm omitting from this summary (pragmas, block closures, etc.) but with these basic ideas in mind you should now be ready to debug a very simple example. For instance, start with
Object compile: 'm ^3'
to internalize the process.
After some stepping into and over, you will reach the first interesting piece of code which is the method OpalCompiler >> #compile. If we remove the error handling blocks this methods speaks for itself:
compile
| cm |
ast := self parse.
self doSemanticAnalysis.
self callPlugins.
cm := ast generate: self compilationContext compiledMethodTrailer
^cm
First we have the #parse message where the parse nodes are created. Then we have the semantic analysis I mentioned above and finally #generate: produces the encoding. You should debug each of these methods to understand the compilation process in depth. Given that you are dealing with a tree be prepared to navigate thru a lot of visitors.
Once you become familiar with the main ideas you may want to try more elaborated -yet simple- examples to see other objects entering the scene.
Here are some simple facts:
Evaluation in Smalltalk is available everywhere: in workspaces, in
the Transcript, in Browsers, inspectors, the debugger, etc.
Basically, if you are allowed to edit text, most likely you will
also be allowed to evaluate it.
There are 4 evaluation commands
Do it (evaluates without showing the answer)
Print it (evaluates and prints the answer next to the expression)
Inspect it (evaluates and opens an inspector on the result)
Debug it (opens a debugger so you can evaluate your expression step by step).
Your expression can contain any literal (numbers, arrays, strings, characters, etc.)
17 "valid expression"
Your expression can contain any message.
3 + 4.
'Hello world' size.
1 bitShift: 28
Your expression can use any Global variable
Object new.
Smalltalk compiler
Your expression can reference self, super, true, nil, false.
SharedRandom globalGenerator next < 0.2 ifTrue: [nil] ifFalse: [self]
Your expression can use any variables declared in the context of the pane where you are writing. For example:
If you are writing in a class browser, self will be bound to the current class
If you are writing in an inspector, self is bound to the object under inspection. You can also use its instances variables in the expression.
If you are in the debugger, your expression can reference self, the instance variables, message arguments, temporaries, etc.
Finally, if you are in a workspace (a.k.a. Playground), you can use any temporaries there, which will be automatically created and remembered, without you having to declare them.
As near as I can tell, there is no API documentation for the Pharo standard library, like you find with other programming languages. This seems to be confirmed on the Pharo User's mailing list: http://forum.world.st/Essential-Documentation-td4916861.html
...there is a draft version of the ANSI standard available: http://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf
...but that doesn't seem to cover the compiler class.

How can I get the public elements from a Rust module?

In Node.js, I could get an array of the objects in foo with
Object.keys(require("foo"));
Is there any way I could do the same thing in Rust?
mod foo;
getobjs(foo);
No, there is no way to do this. This level of introspection of compile-time information simply doesn't exist at runtime. The concept of a module doesn't even exist.
If you are interested in compile-time information, you can do such things as build and view the docs (cargo doc --open) to see all the public items of the entire crate. You can probably also view the crate's documentation online before you use it.
There are also tools like the Rust Language Server which provide this type of information (and more) to editors and IDEs.

modify a Kotlin class

I'd like to write a plugin for Intellij IDEA that should modify a Java and Kotlin code.
I use the method
PsiClass.getMethods()
in order to get all methods of Java and Kotlin classes. So far so good, so then I use methods like
PsiClass.add(), PsiClass.addAfter(), PsiClass.addBefore()
that all work fine once they are called on Java files, but start to throw an exception
IncorrectOperationException
once I called them on a Kotlin class.
I'd appreciate any hint on how I can modify Kotlin and Java classes (preferably using the same approach).
When you search for a Kotlin class via the JavaPsiFacade, it returns the light class which is a shallow representation that is just based on the information in the class file. In order to add PSI elements, you have to call navigationElement on it. Then, IJ will parse the source and build a full PSI tree that can be modified.
However, if the class is a Kotlin class, navigationElement will return a KtClass which is not derived from PsiClass. You will have to use the facilities in the Kotlin hierarchy to modify it. Method instances in Kotlin are also not instances of PsiMethod, but instances of KtMethod.
For analyzing Java and Kotlin sources in a common fashion there is a different syntax tree called "UAST", but for modifications you need a language-specific approach.

What does `ParseTreeListener` mean in ANTLR?

I read the following statement about ParseTreeListener from the book < The Definitive ANTLR 4 Reference >:
ANTLR generates a ParseTreeListener subclass specific to each grammar
with enter and exit methods for each rule.
I am a bit confused about the each grammar notion. My undertanding is:
A language is equivalent to its grammar.
A grammar is just a set of rules.
A program is equivalent to a parse tree representing it.
So if we are working on a language application with ANTLR, there should be only one grammar. Thus there should be only one ParseTreeListener. So what does the each mean here?
ADD 1
As I read on, I have a feeling that the grammar here is merely specific to a *.g4 file. And maybe a language can have multiple *.g4 files. I am not sure if I am correct on this. I will keep updating this question.
after you define a .g4 grammar you can tell Antlr4 to generate a class that implements ParseTreeListener. In more detail, say you have a grammar Lang then Antlr4 generates an interface LangParserListener and a class LangParserBaseListener.
The interface defines all the enter- and exit-methods as mentioned above.
The class LangParserBaseListener gives you a default 'no-operation' implementation for each of the methods (note that there are two methods for each expression in Lang.g4 - so this could be a pretty large class/interface).
The main point of LangParserBaseListener is that it makes it easier to add a listener that only wants to 'listen' a small subset of the rules. For that simply inherit from it and override the respective methods.
And it does make perfectly sense to implement multiple listeners per grammar, e.g. a first pass to define all occurring symbols and a second pass to de-reference all symbolic references. This is also part of the reference textbook on Antlr4.
hope that helps
fricke