I relative new to ANTLR. I have an very easy grammar:
start :
('A' 'B' 'C' '1'
|'A' 'B' 'C' '2'
|'A' 'B' 'C' '3'
)
;
I think that I already understand the basics of the concept of look ahead and backtracking (which works with syntactic predicates). So this grammar works with k=4 or with backtrack=true. But what is the exact difference and the main question is when do I use what? i tried to find the answer on the internet but didn't succeded.
Your grammar works in ANTLR v3 without any options.
The k option limits ANTLR to classical LL(k) parsing. Backtracking means - if the parser cannot predict, which rule to use, it just tries, backtracks and tries again.
The backtracking option you should use when ANTLR cannot build look-ahead DFA for the given grammar. ANTLR v3 can build DFAs from regular expressions pretty easy, but it has its difficulties with recursive rules. For example, this grammar works:
start: recursive_rule ';'
| recursive_rule ':'
;
recursive_rule : (ID)* '%'
;
This grammar below is the same, but expressed through recursion. ANTLR cannot build DFA for it (I actually don’t know why), so you need to switch backtracking on:
start options {backtrack=true;} : recursive_rule ';'
| recursive_rule ':'
;
recursive_rule : ID recursive_rule
|'%'
;
The k option is used to improve the parser performance. I don’t know any other reason for restricting LL(*) to LL(k).
I found a theoretical description to my question in the Book "The definitve Antlr Reference", which was also important for my understanding. Maybe some others who asks themselves similar question would help this snippet of the book too.
Page 262
Related
I am upgrading my Antlr grammar file to latest Antlr4.
I have converted most of the file but stuck in syntax difference that I can't figure out. The 3 such difference is:
equationset: equation* EOF!;
equation: variable ASSIGN expression -> ^(EQUATION variable expression)
;
orExpression
: andExpression ( OR^ andExpression )*
;
In first one, the error is due to !. I am not sure whether EOF and EOF! is same or not. Removing ! resolves the error, but I want to be sure that is the correct fix.
In 2nd rule, -> and ^ is giving error. I am not sure what is Antlr4 equivalent.
In 3rd rule, ^ is giving error. Removing it fixes the error, but I can't find any migration guide that explains what should be equivalent for this.
Can you please give me the Antrl4 equivalent of these 3 rules and give some brief explanation what is the difference? If you can refer to any other resource where I can find the answer is OK as well.
Thanks in advance.
Many of the ANTLR3 grammars contain syntax tree manipulations which are no longer supported with ANTLR4 (now we get a parse tree instead of a syntax tree). What you see here is exactly that.
EOF! means EOF should be matched but not appear in the AST. Since there is no AST anymore you cannot change that, so remove the exclamation mark.
The construct -> ^(EQUATION variable expression) rewrites the AST created by the equation rule. Since there is no AST anymore you cannot change that, so remove that part.
OR^ finally determines that the OR operator should become the root of the generated AST. Since there is no AST anymore ..., you got the point now :-)
I'm implementing a simple program walker grammar and I get this common error in multiple lines. I think it is caused by same reason, but I'm new to antlr so I couldn't figure it out.
For example, in this following code snippet:
program
: (declaration)*
(statement)*
EOF!
;
I got error:
No viable alternative at input '!'
after EOF, and I got a similar error with:
declaration
: INT VARNUM '=' expression ';'
-> ^(DECL VARNUM expression)
;
I got the error:
No viable alternative at input '->'
After reading other questions, I know that matching one token with multiple definitions can cause this problem. But I haven't test it with any input yet, I got this error in intelliJ. How can I fix my problem?
This is ANTLR v3 syntax, you're trying to compile it with ANTLR v4, which won't work.
Either downgrade to ANTLR v3, or use v4 syntax. The difference comes from the fact that v4 doesn't support automatic AST generation, and you're trying to use AST construction operators, which were removed.
The first snippet only requires you to remove the !. Parentheses aren't necessary.
program
: declaration*
statement*
EOF
;
As for the second one, remove everything after the ->:
declaration
: INT VARNUM '=' expression ';'
;
If you need to build an AST with v4, see my answer here.
I am looking for a good way to match a filename in Antlr.
The filename could be DOS or Unix style.
If you have a good solution that to that, feel free to ignore the rest of this question because it is just my newbie attempt at solving the problem and I am probably way off. I have included it because some people like to see sample code.
For purposes of discussion, here is a here is what I am thinking. This is not my actual grammar as all I am interested in for this discussion is filename parsing so I reduced the sample that somewhat meaningful in that context.
Lexer.g4:
lexer grammar Lexer;
K_COPY : C O P Y ;
FILEPATH: [-.a-zA-Z0-9:/\]+;
Parser.g4
parser grammar Parser;
options { tokenVocab=Lexer; }
commandfile: (statement NEWLINE)* EOF;
statement : copy_stmt
;
copy_stmt: K_COPY left=filepath right=filepath
;
// Add characters as we make rules as to what characters are valid:
filepath: FILEPATH;
That is what I am thinking but I am new to Antlr so I wanted to get some feedback before I proceed.
I am using Antlr for this project is already decided and a good part of this project is already working in Antlr, so I am only looking for Antlr based solutions.
I'm developing a 'toy' language to learn antlr.
My construct for a for loop look like this.
for(4,10){
//program expressions
};
I have a grammar that I think works, but it's a little ugly. Specifically I'm not sure that I've handled the semantically unimportant tokens very well.
For example, the comma in the middle there appears as a token, but it's unimportant to the parser, it just needs the 2 and the 3 for the loop bounds. This means when I see the child() elements for the parts of the loop token, I have to skip the unimportant ones.
You can probably see this best if you examine the ANTLR viewer and look at the parse tree for this. The red arrows point to the tokens I think are redundant.
Feel like I should be making more use of the skip() feature than I am, but I can't see how to insert into the grammar for the tokens at this level.
loop: 'for(' foridxitem ',' foridxitem '){' (programexpression)+ '}';
foridxitem: NUM #ForIndexNumÌ
|
var #ForIndexVar;
The short answer is Antlr produces a parse-tree, so there will always be cruft to step over or otherwise ignore when walking the tree.
The longer answer is that there is a tension between skipping cruft in the lexer and producing tokens of limited syntactic value that are nonetheless necessary for writing unambiguous rules.
For example, you identify for( as a candidate for skipping, yet is probably syntactically required. Conversely, the parameters comma could be truly without syntactic meaning. So, you might clean it up in the lexer (and parser) this way:
FOR: 'for(' -> pushMode(params) ;
ENDLOOP: '}' ;
WS: .... -> skip() ;
mode params;
NUM: .... ;
VAR: .... ;
COMMA: ',' -> skip() ;
ENDPARAMS: '){' -> skip(), popMode() ;
P_WS: .... -> skip() ;
Your parer rule then becomes
loop: FOR foridxitem* programexpression+ ENDLOOP ;
foridxitem: NUM | VAR ;
programexpression: .... ;
That should clean up the tree a fair bit.
i need some help with yacc.
i'm working on a infix/postfix translator, the infix to postfix part was really easy but i'm having some issue with the postfix to infix translation.
here's an example on what i was going to do (just to translate an easy ab+c- or an abc+-)
exp: num {printf("+ ");} exp '+'
| num {printf("- ");} exp '-'
| exp {printf("+ ");} num '+'
| exp {printf("- ");} num '-'
|/* empty*/
;
num: number {printf("%d ", $1);}
;
obiously it doesn't work because i'm asking an action (with the printfs) before the actual body so, while compiling, I get many
warning: rule useless in parser due to conflict
the problem is that the printfs are exactly where I need them (or my output wont be an infix expression). is there a way to keep the print actions right there and let yacc identify which one it needs to use?
Basically, no there isn't. The problem is that to resolve what you've got, yacc would have to have an unbounded amount of lookahead. This is… problematic given that yacc is a fairly simple-minded tool, so instead it takes a (bad) guess and throws out some of your rules with a warning. You need to change your grammar so yacc can decide what to do with a token with only a very small amount of lookahead (a single token IIRC). The usual way to do this is to attach the interpretations of the values to the tokens and either use a post-action or, more practically, build a tree which you traverse as a separate step (doing print out of an infix expression from its syntax tree is trivial).
Note that when you've got warnings coming out of yacc, that typically means that your grammar is wrong and that the resulting parser will do very unexpected things. Refine it until you get no warnings from that stage at all. That is, treat grammar warnings as errors; anything else and you'll be sorry.