Upgrading Grammar file to Antlr4 - migration

I am upgrading my Antlr grammar file to latest Antlr4.
I have converted most of the file but stuck in syntax difference that I can't figure out. The 3 such difference is:
equationset: equation* EOF!;
equation: variable ASSIGN expression -> ^(EQUATION variable expression)
;
orExpression
: andExpression ( OR^ andExpression )*
;
In first one, the error is due to !. I am not sure whether EOF and EOF! is same or not. Removing ! resolves the error, but I want to be sure that is the correct fix.
In 2nd rule, -> and ^ is giving error. I am not sure what is Antlr4 equivalent.
In 3rd rule, ^ is giving error. Removing it fixes the error, but I can't find any migration guide that explains what should be equivalent for this.
Can you please give me the Antrl4 equivalent of these 3 rules and give some brief explanation what is the difference? If you can refer to any other resource where I can find the answer is OK as well.
Thanks in advance.

Many of the ANTLR3 grammars contain syntax tree manipulations which are no longer supported with ANTLR4 (now we get a parse tree instead of a syntax tree). What you see here is exactly that.
EOF! means EOF should be matched but not appear in the AST. Since there is no AST anymore you cannot change that, so remove the exclamation mark.
The construct -> ^(EQUATION variable expression) rewrites the AST created by the equation rule. Since there is no AST anymore you cannot change that, so remove that part.
OR^ finally determines that the OR operator should become the root of the generated AST. Since there is no AST anymore ..., you got the point now :-)

Related

Did "!", "^" and "$" had a special meaning in Antlr3?

I dont have any prior knowledge about ANTLR(I recently learned a little bit about ANTLR4), but I have to translate an old grammar to a newer version and eclipse is telling me, that their are no viable alternatives for those characters and shows the syntax error " '!' came as a complete surprise to me".
I already deleted those characters and it does not seam to be a problem, but maybe it had a special function in ANTLR3.
Thanks in advance.
global_block:
DATABASE! IDENTIFIER!
| GLOBALS! define_section!+ END! GLOBALS!
| GLOBALS! STRING!
;
main_block: MAIN sequence? END em=MAIN
-> ^(MAIN MAIN '(' ')' sequence? $em)
;
^ and -> are related to tree rewriting: https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687090/Tree+construction
ANTLR4 does not support it (v4 has listeners and visitors for tree traversal, but no rewriting anymore). Just remove all of these ! and -> ... in parser rules (do not remove the -> ... inside lexer rules like -> channel(...), which is still supported in v4).
So in your case, these rules would be valid in ANTLR4:
global_block
: DATABASE IDENTIFIER
| GLOBALS define_section+ END GLOBALS
| GLOBALS STRING
;
main_block
: MAIN sequence? END MAIN
;
The $ can still be used in ANTLR4: they are used to reference sub-rules or tokens:
expression
: lhs=expression operator=(PLUS | MINUS) rhs=expression
| NUMBER
;
so that in embedded code block, you can do: $lhs.someField.someMethod(). In your case, you can also just remove them because they are probably only used in the tree rewrite rules.
EDIT
kaby76 has a Github page with some instructions for converting grammars to ANTLR4: https://github.com/kaby76/AntlrVSIX/blob/master/doc/Import.md#antlr3

Antlr4 No viable alternative at input symbols

I'm implementing a simple program walker grammar and I get this common error in multiple lines. I think it is caused by same reason, but I'm new to antlr so I couldn't figure it out.
For example, in this following code snippet:
program
: (declaration)*
(statement)*
EOF!
;
I got error:
No viable alternative at input '!'
after EOF, and I got a similar error with:
declaration
: INT VARNUM '=' expression ';'
-> ^(DECL VARNUM expression)
;
I got the error:
No viable alternative at input '->'
After reading other questions, I know that matching one token with multiple definitions can cause this problem. But I haven't test it with any input yet, I got this error in intelliJ. How can I fix my problem?
This is ANTLR v3 syntax, you're trying to compile it with ANTLR v4, which won't work.
Either downgrade to ANTLR v3, or use v4 syntax. The difference comes from the fact that v4 doesn't support automatic AST generation, and you're trying to use AST construction operators, which were removed.
The first snippet only requires you to remove the !. Parentheses aren't necessary.
program
: declaration*
statement*
EOF
;
As for the second one, remove everything after the ->:
declaration
: INT VARNUM '=' expression ';'
;
If you need to build an AST with v4, see my answer here.

Antlr Arrow Syntax

I found this syntax in an Antlr parser for bash:
file_descriptor
: DIGIT -> ^(FILE_DESCRIPTOR DIGIT)
| DIGIT MINUS -> ^(FILE_DESCRIPTOR_MOVE DIGIT);
What does the -> syntax do?
What is it called such that I can google it to read about it?
The 'Definitive Guide to Antlr4' only has one page about it. It refers to "lexer command", but it never names the operator. The usage in the book differs from the usage in the bash parser.
In ANTLR3, -> is used in parser rules and signifies a tree rewrite rule, which is no longer supported in ANTLR4.
In ANTLR4, the -> is used in lexer rules and has nothing to do with the old v3 functionality.

ANTLR error(99) grammar has no rules

I previously posted about my first attempt at using ANTLR when I was having issues with left recursion.
Now that I have resolved those issues, I am getting the following error when I try to use org.antlr.v4.Tool to generate the code:
error(99): C:test.g4::: grammar 'test' has no rules
What are the possible reasons for this error? Using ANTLRWorks I can certainly see rules in the Parse Tree so why can't it see them? Is it because it cannot find a suitable START rule?
I think Antlr expects the first rule name to be in small case. I was getting the same error with my grammar
grammar ee;
Condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
By changing the first production rule in the grammar to lower case it solved the issue i.e. the below grammar worked for me.
grammar ee;
condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
Note: It is my personal interpretation, I couldn't find this reasoning in the online documentation.
Edit: The production rules should begin with lower case letters as specified in the latest docs [1]
[1] https://github.com/antlr/antlr4/blob/master/doc/lexicon.md#identifiers
I'm not sure if you've found the solution for this, but I had the same problem and fixed it by changing my start symbol to 'prog'. So for example, the first two lines of your .g4 file would be:
grammar test;
prog : <...> ;
Where <...> will be your first derivation.
I just got that error as well (antlworks 2.1).
switching from
RULE : THIS | THAT ; to rule : this | that ; for parser rules (i.e. from uppercase to lowercase) solved the problem!
EDIT
The above correction holds only for RULE , what follows after the : can be any combination of lexer/parser rules
The most likely cause is just what the error message suggests. And the most likely reason for that is that you have not saved your grammar to the file--or if you're using ANTLRWorks2--ANTLRWorks hasn't saved your work to the file. I have no idea why ANTLRWorks doesn't save reliably.
I also got the same error but could not fix it.
I downloaded antlrworks-1.4.jar and it's working perfectly.
Download >> antlrworks-1.4.jar
Changing the first rule to start with a lower case character worked for me.
I had the same problem, and this means that your grammar has no Syntactic rules. So in order to avoid this error, you need to write at least one Syntactic rule.

Yacc "rule useless due to conflicts"

i need some help with yacc.
i'm working on a infix/postfix translator, the infix to postfix part was really easy but i'm having some issue with the postfix to infix translation.
here's an example on what i was going to do (just to translate an easy ab+c- or an abc+-)
exp: num {printf("+ ");} exp '+'
| num {printf("- ");} exp '-'
| exp {printf("+ ");} num '+'
| exp {printf("- ");} num '-'
|/* empty*/
;
num: number {printf("%d ", $1);}
;
obiously it doesn't work because i'm asking an action (with the printfs) before the actual body so, while compiling, I get many
warning: rule useless in parser due to conflict
the problem is that the printfs are exactly where I need them (or my output wont be an infix expression). is there a way to keep the print actions right there and let yacc identify which one it needs to use?
Basically, no there isn't. The problem is that to resolve what you've got, yacc would have to have an unbounded amount of lookahead. This is… problematic given that yacc is a fairly simple-minded tool, so instead it takes a (bad) guess and throws out some of your rules with a warning. You need to change your grammar so yacc can decide what to do with a token with only a very small amount of lookahead (a single token IIRC). The usual way to do this is to attach the interpretations of the values to the tokens and either use a post-action or, more practically, build a tree which you traverse as a separate step (doing print out of an infix expression from its syntax tree is trivial).
Note that when you've got warnings coming out of yacc, that typically means that your grammar is wrong and that the resulting parser will do very unexpected things. Refine it until you get no warnings from that stage at all. That is, treat grammar warnings as errors; anything else and you'll be sorry.