I read the following statement about ParseTreeListener from the book < The Definitive ANTLR 4 Reference >:
ANTLR generates a ParseTreeListener subclass specific to each grammar
with enter and exit methods for each rule.
I am a bit confused about the each grammar notion. My undertanding is:
A language is equivalent to its grammar.
A grammar is just a set of rules.
A program is equivalent to a parse tree representing it.
So if we are working on a language application with ANTLR, there should be only one grammar. Thus there should be only one ParseTreeListener. So what does the each mean here?
ADD 1
As I read on, I have a feeling that the grammar here is merely specific to a *.g4 file. And maybe a language can have multiple *.g4 files. I am not sure if I am correct on this. I will keep updating this question.
after you define a .g4 grammar you can tell Antlr4 to generate a class that implements ParseTreeListener. In more detail, say you have a grammar Lang then Antlr4 generates an interface LangParserListener and a class LangParserBaseListener.
The interface defines all the enter- and exit-methods as mentioned above.
The class LangParserBaseListener gives you a default 'no-operation' implementation for each of the methods (note that there are two methods for each expression in Lang.g4 - so this could be a pretty large class/interface).
The main point of LangParserBaseListener is that it makes it easier to add a listener that only wants to 'listen' a small subset of the rules. For that simply inherit from it and override the respective methods.
And it does make perfectly sense to implement multiple listeners per grammar, e.g. a first pass to define all occurring symbols and a second pass to de-reference all symbolic references. This is also part of the reference textbook on Antlr4.
hope that helps
fricke
Related
I’m writing a transpiler and came accross the topic of validating the input. I have some questions, but would also like to double-check if I understood everything correctly. To my understanding, there are 3 main validations you have to do when transpiling (or compiling) a programming language:
Syntax/grammar validation. This is done in my case by ANTLR which makes sure the input respects the BNF grammar.
Context validation. ANTLR only makes sure the input respects the grammar, but the grammar is context-free: for example the grammar of Java allows public, private, protected access modifiers on a class, but it will allow a class to have all 3 of them, it doesn’t know that a class should only have one of them. So this second validation makes sure that, for example, a class does not have more than one access modifier - I imagine I can do this as a visitor pattern on my AST, right?
Dependencies/references validation. Check that we have, for example, all the classes which are declared as import statements in the current compilation unit - this also seems fairly easy, but what do you do about method references/calls to 3rd party classes? Say, for example, your code calls a class from JDK – how do you check that a reference to that class is correct, do you need to also compile that class and add it to your AST?
For example, you can use java.util.List in Kotlin. How does the Kotlin compiler know to tell you if you are using a String instead of an Integer when calling List.get(int index)? Does the Kotlin compiler also compile the java.util.List interface?
Thank you for reading, any response is appreciated.
I've got an embarrassingly simple question here. I'm a smalltalk newbie (I attempt to dabble with it every 5 years or so), and I've got Pharo 6.1 running. How do I go about finding the official standard library documentation? Especially for the compiler class? Things like the compile and evaluate methods? I don't see how to perform a search with the Help Browser, and the method comments in the compiler class are fairly terse and cryptic. I also don't see an obvious link to the standard library API documentation at: http://pharo.org/documentation. The books "Pharo by Example" and "Deep into Pharo" don't appear to cover that class either. I imagine the class is probably similar for Squeak and other smalltalks, so a link to their documentation for the compiler class could be helpful as well.
Thanks!
There are several classes that collaborate in the compilation of a method (or expression) and, given your interest in the subject, I'm tempted to stimulate you even further in their study and understanding.
Generally speaking, the main classes are the Scanner, the Parser, the Compiler and the Encoder. Depending on the dialect these may have slightly different names and implementations but the central idea remains the same.
The Scanner parses the stream of characters of the source code and produces a stream of tokens. These tokens are then parsed by the Parser, which transforms them into the nodes of the AST (Abstract Syntax Tree). Then the Compiler visits the nodes of the AST to analyze them semantically. Here all variable nodes are classified: method arguments, method temporaries, shared, block arguments, block temporaries, etc. It is during this analysis where all variables get bound in their corresponding scope. At this point the AST is no longer "abstract" as it has been annotated with binding information. Finally, the nodes are revisited to generate the literal frame and bytecodes of the compiled method.
Of course, there are lots of things I'm omitting from this summary (pragmas, block closures, etc.) but with these basic ideas in mind you should now be ready to debug a very simple example. For instance, start with
Object compile: 'm ^3'
to internalize the process.
After some stepping into and over, you will reach the first interesting piece of code which is the method OpalCompiler >> #compile. If we remove the error handling blocks this methods speaks for itself:
compile
| cm |
ast := self parse.
self doSemanticAnalysis.
self callPlugins.
cm := ast generate: self compilationContext compiledMethodTrailer
^cm
First we have the #parse message where the parse nodes are created. Then we have the semantic analysis I mentioned above and finally #generate: produces the encoding. You should debug each of these methods to understand the compilation process in depth. Given that you are dealing with a tree be prepared to navigate thru a lot of visitors.
Once you become familiar with the main ideas you may want to try more elaborated -yet simple- examples to see other objects entering the scene.
Here are some simple facts:
Evaluation in Smalltalk is available everywhere: in workspaces, in
the Transcript, in Browsers, inspectors, the debugger, etc.
Basically, if you are allowed to edit text, most likely you will
also be allowed to evaluate it.
There are 4 evaluation commands
Do it (evaluates without showing the answer)
Print it (evaluates and prints the answer next to the expression)
Inspect it (evaluates and opens an inspector on the result)
Debug it (opens a debugger so you can evaluate your expression step by step).
Your expression can contain any literal (numbers, arrays, strings, characters, etc.)
17 "valid expression"
Your expression can contain any message.
3 + 4.
'Hello world' size.
1 bitShift: 28
Your expression can use any Global variable
Object new.
Smalltalk compiler
Your expression can reference self, super, true, nil, false.
SharedRandom globalGenerator next < 0.2 ifTrue: [nil] ifFalse: [self]
Your expression can use any variables declared in the context of the pane where you are writing. For example:
If you are writing in a class browser, self will be bound to the current class
If you are writing in an inspector, self is bound to the object under inspection. You can also use its instances variables in the expression.
If you are in the debugger, your expression can reference self, the instance variables, message arguments, temporaries, etc.
Finally, if you are in a workspace (a.k.a. Playground), you can use any temporaries there, which will be automatically created and remembered, without you having to declare them.
As near as I can tell, there is no API documentation for the Pharo standard library, like you find with other programming languages. This seems to be confirmed on the Pharo User's mailing list: http://forum.world.st/Essential-Documentation-td4916861.html
...there is a draft version of the ANSI standard available: http://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf
...but that doesn't seem to cover the compiler class.
As a self-taught programmer, my definitions get fuzzy sometimes.
I'm very used to C and ObjC. In both of those your code must adhere to the language "structure". You can only do certain things in certain places. As an example, this is an error:
// beginning of file
NSLog(#"Hello world!"); // can't do this
#implementation MYClass
...
#end
However, in Ruby, anything you put anywhere is executed as the interpreter goes through it. So what is the difference between Ruby and Objective-C that allows this?
At first I thought it was that one was interpreted and the other compiled. Then I read some SO posts and the wikipedia definitions. Interpreted or compiled is a property of the implementation not the language. So that would mean there could (theoretically) be an interpreted implementation of Objective-C? In that case, the fact that a statement cannot be outside the implementation can't be a property of compiled languages, and vice-versa if there was a compiled implementation of Ruby. Or am I wrong in assuming that different implementations of a language would work the same way?
I'm not sure there's a technical term for it, but in most programming languages the context of the statement is extremely important.
Ruby has a concept of a root or main context where code is allowed. Other scripting languages follow this convention, presumably made popular by languages like Perl which allowed for very concise programming.
This allows things like this to be a complete and valid program:
print "Hello world!\n"
In other languages you need to define an entry point, such as a main routine, that is executed instead. Arbitrary code is not really allowed at the top level, which instead is reserved for things like function, type, constant, structure and class definitions.
A language like Ruby has a lot of control over the order in which the code is executed. C, by comparison, is usually composed of separate source files that are then linked together, where there's no inherent order to the way things are linked. All the modules are simply assembled into the final library or executable. This is why the main entry point is required, it defines which function to run first.
In short, it boils down to syntax, context, and language design considerations.
Ruby hides lots of stuff.
Ruby is OO like C++, Objective C and Java, and has main like C but you don't see this.
puts(42) is method call. It is a method of the main object called main. You can see it by typing puts self.
If you don't specify the receiver (receiver.method()) Ruby will use the implicit one, main.
Check available methods:
puts Object.private_methods.sort
Why you can put everything anywhere?
C/C++ look for main method called main, and when C/C++ find it, it will be executed.
Ruby on other hands doesn't need main or other method/class to run first.
It execute code from the first line until it meet the end of file(or __END__ on the separate line).
class Strongman
puts "I'm the best!"
end
is just syntactic sugar for Class.new method:
Strongman = Class.new do
puts "I'm the best!"
end
The same goes for 'module`.
for calls each and returns some kind of object. So you may think of it as something similar to method.
a = for i in 1..12; 42;end
puts a
# 1..12
In the end, it doesn't matter if it is method call or some kind of structure like C's int main(). Programming language decides what it should run first.
I am looking at Objective-C and I notice, for example, a class-interface declaration begins with #interface. Fine, no problema. The text, therefore, suggests no space is permitted between the # and interface. However, when I pass the following simple example to the GCC compiler in a *.m file:
# interface A
# end
the compiler accepts the code without complaint. Can Anyone point Me in the direction of a reference which says explicitly whether or not # interface is also considered acceptable by the Objective-C specification? I found nothing in Apple's 2008 and 2011 documents to say one way or the other besides the simple text alluded to earlier in the question.
Thanks in advance.
EDIT: It may be worth noting Emacs performs text coloring based on whether the identifier is a keyword or not; keywords are blue and non-keywords are yellow. The #interface colors blue and # interface colors yellow. Similar behavior occurs in Vim.
There is no formal specification for Objective-C (beyond The Objective-C Programming Language). There's definitely no BNF-style definition of the whitespace conventions. If it compiles, that's about the closest we have to "legal." This is true of many languages. Perl for instance is best defined as "those strings which the perl executable will not reject." (At least in my opinion....)
That said, the correct style is #interface without a space. See Defining a Class.
How would I go about adding a relatively trivial keyword to Objective-C using the Clang compiler? For example, adding a literal #yes which maps to [NSNumber numberWithBool:YES].
I have looked at the (excellent) source code for Clang and believe that most of the work I would need to do is in lib/Rewrite/RewriteObjC.cpp. There is the method RewriteObjC::RewriteObjCStringLiteral (see previous link) which does a similar job for literal NSString * instances.
I ask this question as Clang is very modular and I'm not sure which .td (see tablegen) files, .h files and AST visitor passes I would need to modify to achieve my goal.
If I understand the clang's code correctly (I'm still learning, so take caution), I think the starting point for this type of addition would be in Parser::ParseObjCAtExpression within clang/lib/Parse/ParseObjc.cpp.
One thing to note is that the Parser class is implemented in several files (seemingly separated by input language), but is declared entirely in clang/include/Parser.h.
Parser has many methods following the pattern of ParseObjCAt, e.g.,
ParseObjCAtExpression
ParseObjCAtStatement
ParseObjCAtDirectives
etc..
Specifically, line 1779 of ParseObjc.cpp appears to be where the parser detects an Objective-C string literal in the form of #"foo". However, it also calls ParsePostfixExpressionSuffix which I don't fully understand yet. I haven't figured out how it knows to parse a string literal (vs. an #synchronize, for example).
ExprResult Parser::ParseObjCAtExpression(SourceLocation AtLoc) {
...
return ParsePostfixExpressionSuffix(ParseObjCStringLiteral(AtLoc));
...
}
If you haven't yet, visit clang's "Getting Started" page to get started with compiling.