ANTLR: exclude (skip) tokens when building AST tree - antlr

Given the following grammar (in ANTLR v3):
test : value0 COMMA_KEYWORD value1 (COMMA_KEYWORD value2)*;
How can we exclude (skip) COMMA_KEYWORD from the AST tree built by ANTLR (and without using a write rule)?

The alternative to using rewrite rules is to use tree construction operators:
https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687090/Tree+construction
You can use ! operator to omit a token or subtree from AST:
test : value0 COMMA_KEYWORD! value1 (COMMA_KEYWORD! value2)*;

Related

Upgrading Grammar file to Antlr4

I am upgrading my Antlr grammar file to latest Antlr4.
I have converted most of the file but stuck in syntax difference that I can't figure out. The 3 such difference is:
equationset: equation* EOF!;
equation: variable ASSIGN expression -> ^(EQUATION variable expression)
;
orExpression
: andExpression ( OR^ andExpression )*
;
In first one, the error is due to !. I am not sure whether EOF and EOF! is same or not. Removing ! resolves the error, but I want to be sure that is the correct fix.
In 2nd rule, -> and ^ is giving error. I am not sure what is Antlr4 equivalent.
In 3rd rule, ^ is giving error. Removing it fixes the error, but I can't find any migration guide that explains what should be equivalent for this.
Can you please give me the Antrl4 equivalent of these 3 rules and give some brief explanation what is the difference? If you can refer to any other resource where I can find the answer is OK as well.
Thanks in advance.
Many of the ANTLR3 grammars contain syntax tree manipulations which are no longer supported with ANTLR4 (now we get a parse tree instead of a syntax tree). What you see here is exactly that.
EOF! means EOF should be matched but not appear in the AST. Since there is no AST anymore you cannot change that, so remove the exclamation mark.
The construct -> ^(EQUATION variable expression) rewrites the AST created by the equation rule. Since there is no AST anymore you cannot change that, so remove that part.
OR^ finally determines that the OR operator should become the root of the generated AST. Since there is no AST anymore ..., you got the point now :-)

Antlr Arrow Syntax

I found this syntax in an Antlr parser for bash:
file_descriptor
: DIGIT -> ^(FILE_DESCRIPTOR DIGIT)
| DIGIT MINUS -> ^(FILE_DESCRIPTOR_MOVE DIGIT);
What does the -> syntax do?
What is it called such that I can google it to read about it?
The 'Definitive Guide to Antlr4' only has one page about it. It refers to "lexer command", but it never names the operator. The usage in the book differs from the usage in the bash parser.
In ANTLR3, -> is used in parser rules and signifies a tree rewrite rule, which is no longer supported in ANTLR4.
In ANTLR4, the -> is used in lexer rules and has nothing to do with the old v3 functionality.

What is the antlr4 (v-4.1) equivalent form of the following grammar rule (written for antlr3 (v-3.2))?

What is the antlr4 (v-4.1) equivalent form of the following grammar rule (written for antlr3 (v-3.2))?
text
: tag => (tag)!
| outsidetag
;
The following is invalid in ANTLR 3:
text
: tag => (tag)!
| outsidetag
;
You probably meant the following:
text
: (tag)=> (tag)!
| outsidetag
;
where ( ... )=> is a syntactic predicate, which has no ANTLR4 equivalent: simply remove them. As 280Z28 mentioned (and also explained in the previous link): the lack of syntactic predicates is not a feature that was removed from ANTLR 4. It's a workaround for a weakness in ANTLR 3's prediction algorithm that no longer applies to ANTLR 4.
The exlamation mark in v3 denotes to removal of a rule in the generated AST. Since ANTLR4 does not produce AST's, also just remove the exclamation mark.
So, the v4 equivalent would look like this:
text
: tag
| outsidetag
;

Explanation for the following grammar written in ANTLR 4

I have a sample grammar written in ANTLR 4
query : select from ';' !? EOF!
I have understood
query : select from ';'
how it works
What does !? EOF! means in the grammar and how it works?
The exclamation marks is used in ANTLR v3 grammars to denote that a certain node should be omitted from the generated AST. Since ANTLR v4 does not have AST's, this construct is no longer used.
In both v3 and v4, the ? denotes that a rule (lexer or parser) is optional and EOF means the end-of-file constant.
To summarize ';'!? means: optionally match a ';' and exclude it from the AST. And EOF! means: match the end-of-file and exclude this token from the AST.
So, the v3 parser rule:
query : select from ';'!? EOF!
should look like this in a v4 grammar:
query : select from ';'? EOF

How to represent multiple parents as rewrite rule?

Say I have the following ANTLR rule:
ROOT: 'r' ('0'..'9')*;
CHILD: 'c' ('0'..'9')*;
expression: ROOT ('.'^ CHILD)*;
For input such as r.c1.c2.c3, ANTLR would make the following tree:
.(.(.(r c1) c2) c3)
How can I represent the parent property of '.' without the ^ operator directly, i.e., in a rewrite rule?
expression: ROOT ('.' CHILD)* -> ?
The trick is to invoke the expression rule recursively in the rewrite rule (the $expression part below):
expression : (ROOT -> ROOT) ('.' CHILD -> ^('.' $expression CHILD))*;
which is equivalent to:
expression: ROOT ('.'^ CHILD)*;
Yeah, I know, it's not pretty, there is no simple syntax like you (may have) hoped for:
expression: ROOT ('.' CHILD)* -> ^(...);
See: Parr's Definitive ANTLR Reference, chapter 7, paragraph "Referencing Previous Rule ASTs in Rewrite Rules", page 174.