I'm new to ANTLR and using ANTLR4 (4.7.2 Jar file). I'm currently working on Oracle Parser.
Is there a way to add a node (with some text) directly to the AST from the Parser or Lexer?
I'm hiding comments in my Lexer and would like to add that directly to the tree.
Is it possible? I believe Less4j allows something similar.
Is there a way to add a node (with some text) directly to the AST from the Parser or Lexer?
Not from the lexer: at that phase there is no parse tree yet.
From the parser you could, but there's no ANTLR API to do that. ANTLR gives you the parse tree just as it parses your input. It does not allow you to mutate it. You'll have to create your own parse tree while you traverse the ANTLR parse tree and do the mutations yourself (including reading of the hidden channel) during that stage.
Related
I understand that one could generate lexer and parser given the antl4 grammar but Is there a way to generate builder using the antlr4 grammar? That way client can use the builder to construct the possible structure specified in the grammar while the server can use the generated parser to parse the structure.
There is, yes. Such a sentence generator can walk the ATN and create sentences according to the grammar (see my antlr4-vscode extension of how this can be implemented). However, unless you have a very simple grammar with no recursions or iterations, you will probably not be able to generate a fixed set of sentences, since there are infinitive possible combinations.
If I have an AST and modify it, can I use StringTemplates to generate the source code for the modified AST?
I have successfully implemented my grammar for Antlr4. It generates the AST of a source code and I use the Visitor Class to perform the desired actions. I then modify something in the AST and I would like to generate the source code for that modified AST. (I believe it is called pretty-printing?).
Does Antlr's built in StringTemplates have all the functionality to do this? Where should one start (practical advice is very welcome)?
You can walk the tree and use string templates (or even plain out string prints) to spit out text equivalents that to some extent reproduce the source text.
But you will find reproducing the source text in a realistic way harder to do than this suggests. If you want back code that the original programmer will not reject, you need to:
Preserve comments. I don't think ANTLR ASTs do this.
Generate layout that preserves the original indentation.
Preserve the radix, leading-zero count, and other "format" properties of literal values
Renerate strings with reasonable escapes
Doing all of this well is tricky. See my SO answer How to compile an AST back to source code for more details. (Weirdly, the ANTLR guy suggests not using an AST at all; I'm guessing this is because string templates only work on ANTLR parse trees whose structure ANTLR understands, vs. ASTs which are whatever you home-rolled.)
If you get all of this right, what you are likely to discover is that modifying the parse tree/AST is harder than it looks. For almost any interesting task on complex languages, you need information which is not trivial to extract from the tree (e.g., what is the meaning of this identifier?, where is this variable used?,...) I call this the problem of Life After Parsing. My main point is that it takes a lot of machinery to modify ASTs and regenerate code; be aware of the size of your project.
Where do we start to manually build a CST from scratch? Or does ANTLR4 always require the lex/parse process as our input step?
I have some visual elements in my program that represent code structures.
e.g. a square represents a class, while a circle embedded within that square represents a method.
Now I want to turn those into code. How do I use ANTLR4 to do this, at runtime (using ANTLR4.js)? Most of the ANTLR examples seem to rely on lexing and parsing existing code to get to a syntax tree. So rather than:
input code->lex->parse->syntax tree->output code (1)
I want
manually create syntax tree->output code (2)
(Later, as the user adds code to that class and its methods, then ANTLR will be used as in (1).)
EDIT Maybe I'm misunderstanding this. Do I create some custom data structure and then run the parser over it? i.e. write structures to some in-memory format->parse->output code (3)?
IIUC, you could use StringTemplate directly.
By, way of background, Antlr itself builds an in-memory parse-tree and then walks it, incrementally calling StringTemplate to output code snippets qualified by corresponding parse-tree node data. That Antlr uses an internal parse-tree is just a convenience for simplifying walking (since Antlr is built using Antlr).
If you have your own data structure, regardless of its specific implementation, procedurally process it to progressively call ST templates to emit the corresponding code. And, you can directly use the same templates that Antlr uses (JavaScript.stg), if they meet your requirements.
Of course, if your data structure is of a nature that can be lex'd/parsed into a standard Antlr parse-tree, you can then use a standard Antlr visitor to call and populate node-specific templates.
I am trying to write a manual tree walker in Java for an AST generated by ANTLR V3. The AST is built using island grammers as similar to the one specified in ANTLR: call a rule from a different grammar.
In the AST, I have a node for expression list with each expression as child node. Now I need to know the line numbers of the COMMAs which seperated the expressions. The COMMAs were present in parsing but removed during AST rewrite.
I see some resources(here and here) pointing to the usage of CommonTokenStream.getTokens but I am not sure how I can access the CommonTokenStream while processing the AST. Is there anyway I can get the CommonTokenStream used to build the AST?
The complete list of tokens is accessible through CommonTokenStream.getTokens(), which you can call before you call the tree walker. The list of tokens would be an argument to the walker. There's no need to change CommonTree, unless you want the recovered information embedded in the tree.
I've used the token list to associate hidden tokens such as comments and explicit line numbers (think FORTRAN) with the closest visible token. This was done post-processing the AST and looking at the line, column, and char-index information which is available for both the tokens in the list and the nodes in the AST.
My attempts at trying to that during AST construction resulted in hacky, unmaintainable code. The post-processing code, OTOH, is Programming-101 algorithmic.
I have a little bit large ANTLR parser grammar file and want to make a tree grammar for it. But, as far as I know this work of tree grammar generation can't be done automatically, i.e., I should generate it manually by copying parser grammar, removing some unnecessary code, etc. I want to know if there is a systematic way to generate a tree grammar file from a parser grammar file.
P.S. I read an article that insists that 'Manual Tree Walking Is Better Than Tree Grammars'. Is this reliable information? If so, would it be better for me to make a manual tree walker than writing an ANTLR tree grammar file? And then, how do I make a manual tree walker with my ANTLR parser grammar file(it makes an AST using rewrite rules)?
Thanks in advance.
sky wrote:
I want to know if there is a systematic way to generate a tree grammar file from a parser grammar file
You've already described the systematic way to do this: copy the parser/production rules in the tree grammar and only leave the rewrite rules in it. This will probably handle the larger part of your rules, but with other parser rules (using inline AST rewrite rules), it might look slightly different. Because of that, there is no automatic way to generate a tree grammar.
sky wrote:
P.S. I read an article that insists that 'Manual Tree Walking Is Better Than Tree Grammars'. Is this reliable information?
Yes, it is. Note that Terence Parr (creator of ANTLR) posted the article on the ANTLR wiki himself, so that says the author of it (Andy Tripp) raises valid points.
sky wrote:
If so, would it be better for me to make a manual tree walker than writing an ANTLR tree grammar file?
As Andy mentioned in his conclusion: "The decision about whether to use a "Tree Grammar" approach to translation vs. just "doing it by hand" is a matter of taste.". So, if you think writing tree grammar is too much hassle, go the manual way. It's up to you: there is no best way here.
sky wrote:
And then, how do I make a manual tree walker with my ANTLR parser grammar file(it makes an AST using rewrite rules)?
Your parser will create an AST, which by default is of type CommonTree (API-doc). You can use that tree to get the children, the parent, the type of the token etc.: all you need to manually walk the tree.
EDIT
Note that in the next version of ANTLR (version 4) it will (most likely) be possible to automatically generate a tree walker given a combined- or parser grammar.
See:
https://web.archive.org/web/20130620232750/http://www.antlr.org/wiki/display/~admin/ANTLR+v4+plans
https://web.archive.org/web/20130927174157/http://www.antlr.org/wiki/display/~admin/2011/09/05/Auto+tree+construction+and+visitors
https://web.archive.org/web/20130927175520/http://www.antlr.org/wiki/display/~admin/2011/09/08/Sample+v4+generated+visitor