error(211): [fatal] rule conditions has non-LL(*) - antlr

I use ANTLR to create a grammar, but I get this error
error(211): [fatal] rule conditions has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
my grammar rules:
conditions
: '(' conditions ')'
| condition (C_BINARY_OPERATOR conditions)?
;
condition
: expression C_CONDITIONAL_OPERATOR expression
;
expression
: (term) (('+'|'-') term)*
;
term
: (factor) (('*' | '/') factor)*
;
factor
: C_ID
| C_NUMBERS
| '(' expression ')'
;
// Binary Operators for Logical Calculation
C_BINARY_OPERATOR
: '&&'
| '||'
;
// Conditonal Operators
C_CONDITIONAL_OPERATOR
: '>'
| '<'
| '=='
| '!='
| '=<'
| '=>'
;
How can I fix this error?

See this page on the ANTLR website. It has information on how to fix your error.

Well, the error does say "Resolve by left-factoring or using syntactic predicates or using backtrack=true option". Is that confusing?

Related

Xtext Grammar Ambiguities (Backtracking is not working)

I am trying to use Xtext to design a simple language for operations on sets of numbers.
Here are some examples of strings in the language:
{2,1+6} (A set of numbers 2 and 7)
{1+3, 3+5} + {2..5} (A union of sets {4, 8} and {2, 3, 4, 5})
I am using the following grammar:
grammar org.example.Set with org.eclipse.xtext.common.Terminals
generate set "http://www.set.net/set"
SetAddition returns SetExpression:
SetMultiplication ({SetAddition.left=current} '+' right=SetMultiplication)*
;
SetMultiplication returns SetExpression:
SetPrimary ({SetMultiplication.left=current} ('*'|'\\') right=SetPrimary)*
;
SetPrimary returns SetExpression:
SetAtom | '(' SetAddition ')'
;
SetAtom returns SetExpression:
Set | Range
;
Set:
lhs = '{' (car=ArithmeticTerm (',' cdr+=ArithmeticTerm)*)? '}'
;
Range:
'{' lhs=ArithmeticTerm '.' '.' rhs=ArithmeticTerm '}'
;
ArithmeticTerm:
Addition //| Multiplication
;
Addition returns ArithmeticTerm:
Multiplication ({Addition.lhs=current} ('+'|'-') rhs=Multiplication)*
;
Multiplication returns ArithmeticTerm:
Primary ({Multiplication.lhs=current} ('*'|'/'|'%') rhs=Primary)*
;
Primary returns ArithmeticTerm:
ArithmeticAtom |
'(' Addition ')'
;
ArithmeticAtom:
value = INT
;
When I execute MWE2 workflow, I get the following error:
error(211): ../net.certware.argument.language.ui/src-gen/net/certware/argument/language/ui/contentassist/antlr/internal/InternalL.g:420:1: [fatal] rule rule__SetAtom__Alternatives has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
I do have backtracking enabled in mwe2 file.
I have this fragment of code in it:
// The antlr parser generator fragment.
fragment = parser.antlr.XtextAntlrGeneratorFragment auto-inject {
options = {
backtrack = true
}
}
And there is no other fragments that mention ANTLR in mwe2 file.
The version of Xtext I am using is Xtext 2.8.0 integrated in Full Eclipse available from Xtext Website.
Why does ANTLR suggest me to enable backtracking if it is already enabled?
Is there anything wrong with my grammar?
The error stems from your syntax for
SetPrimary returns SetExpression:
SetAtom | '(' SetAddition ')'
;
and
Primary returns ArithmeticTerm:
ArithmeticAtom |
'(' Addition ')'
;
which can be reduced to
SetPrimary returns SetExpression:
ArithmeticAtom | '(' Addition ')' | '(' SetAddition ')'
;
Since Addition and SetAddition are indistinguishable with finite lookahead (both can start with an infinite number of opening ( ). Therefore you need backtracking in the first place - you may want to reconsider the syntax or the AST structure.
Anyway, please add the backtracking also to the XtextAntlrUiGeneratorFragment in your workflow.

Exclude tokens from Identifier lexical rule

I have Identifier lexical rule:
Identifier
: ( 'a'..'z' | 'A'..'Z' | '_' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' )*
;
LogicalOr and LogicalAnd rules:
LogicalOr : '| ' | '||' | OR;
LogicalAnd : '&' | '&&' | AND;
fragment Or : '[Oo][Rr]';
fragment And : '[Aa][Nn][Dd]';
strings "and" and "or" are recognized as identifiers, instead of logicalAnd and logicalOr. Could someone help me to solve this problem please?
There are two potential issues at play. First and foremost, ANTLR 3 does not support the character class syntax introduced by ANTLR 4. Your Or fragment literally matches the input [Oo][Rr]; it does not match OR, or, or oR. The same applies to your And fragment. You need to write the rule like this instead:
fragment
Or
: ('O' | 'o') ('R' | 'r')
;
If this does not resolve your issue, then you need to make sure your LogicalOr and LogicalAnd rules are positioned before the Identifier rule in the grammar. The rule which appears first will determine what token type is assigned for this input sequence.

What is wrong with this ANTLR Grammar? Conditional statement nested parenthesis

I've been tasked with writing a prototype of my team's DSL in Java, so I thought I would try it out using ANTLR. However I'm having problems with the 'expression' and 'condition' rules.
The DSL is already well defined so I would like to keep as close to the current spec as possible.
grammar MyDSL;
// Obviously this is just a snippet of the whole language, but it should give a
// decent view of the issue.
entry
: condition EOF
;
condition
: LPAREN condition RPAREN
| atomic_condition
| NOT condition
| condition AND condition
| condition OR condition
;
atomic_condition
: expression compare_operator expression
| expression (IS NULL | IS NOT NULL)
| identifier
| BOOLEAN
;
compare_operator
: EQUALS
| NEQUALS
| GT | LT
| GTEQUALS | LTEQUALS
;
expression
: LPAREN expression RPAREN
| atomic_expression
| PREFIX expression
| expression (MULTIPLY | DIVIDE) expression
| expression (ADD | SUBTRACT) expression
| expression CONCATENATE expression
;
atomic_expression
: SUBSTR LPAREN expression COMMA expression (COMMA expression)? RPAREN
| identifier
| INTEGER
;
identifier
: WORD
;
// Function Names
SUBSTR: 'SUBSTR';
// Control Chars
LPAREN : '(';
RPAREN : ')';
COMMA : ',';
// Literals and Identifiers
fragment DIGIT : [0-9] ;
INTEGER: DIGIT+;
fragment LETTER : [A-Za-z#$#];
fragment CHARACTER : DIGIT | LETTER | '_';
WORD: LETTER CHARACTER*;
BOOLEAN: 'TRUE' | 'FALSE';
// Arithmetic Operators
MULTIPLY : '*';
DIVIDE : '/';
ADD : '+';
SUBTRACT : '-';
PREFIX: ADD| SUBTRACT ;
// String Operators
CONCATENATE : '||';
// Comparison Operators
EQUALS : '==';
NEQUALS : '<>';
GTEQUALS : '>=';
LTEQUALS : '<=';
GT : '>';
LT : '<';
// Logical Operators
NOT : 'NOT';
AND : 'AND';
OR : 'OR';
// Keywords
IS : 'IS';
NULL: 'NULL';
// Whitespace
BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;
The phrase I'm testing with is
(FOO == 115 AND (SUBSTR(BAR,2,1) == 1 OR SUBSTR(BAR,4,1) == 1))
However it is breaking on the nested parenthesis, matching the first ( with the first ) instead of the outermost (see below). In ANTLR3 I solved this with semantic predicates but it seems that ANTLR4 is supposed to have fixed the need for those.
I'd really like to keep the condition and the expression rules separate if at all possible. I have been able to get it to work when merged together in a single expression rule (based on examples here and elsewhere) but the current DSL spec has them as different and I'm trying to reduce any possible differences in behaviour.
Can anyone point out how I can get this all working while maintaining a separate rule for conditions' andexpressions`? Many thanks!
The grammar seems fine to me.
There's one thing going wrong in the lexer: the WORD token is defined before various keywords/operators causing it to get precedence over them. Place your WORD rule at the very end of your lexer rules (or at least after the last keywords which WORD could also match).

how to solve this simple antlr recursive issue

I was reading the URL (and trying to copy) and failed... (great article on antlr too).
https://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/antlr/antlr.html
My solution before I added parenthesis stuff:
whereClause: WHERE expression -> ^(WHERE_CLAUSE expression);
expression: orExpr;
orExpr: andExpr (OR^ andExpr)*;
andExpr: primaryExpr (AND^ primaryExpr)*;
primaryExpr: parameterExpr | inExpr | compExpr;
My solution that failed due to infinite recursion (but I thought the LPAREN^ and RPAREN! where supposed to solve that???)....
whereClause: WHERE^ (expression | orExpr);
expression: LPAREN^ orExpr RPAREN!;
orExpr: andExpr (OR^ andExpr)*;
andExpr: primaryExpr (AND^ primaryExpr)*;
primaryExpr: parameterExpr | inExpr | compExpr | expression;
Notice primaryExpr at bottom has expression tacked back on which has LPAREN and RPAREN, but the WHERE can be an orExpr or expression (i.e. the first expression can use or not use parentheses).
I am sure this is probably a simple issue like a typo that I keep staring at for hours or something.
I was reading the url(and trying to copy) and failed...(great article on antlr too)...
Note that the article explains ANTLR v2, which has a significant different syntax than v3. Better look for a decent ANTLR v3 tutorial here: https://stackoverflow.com/questions/278480/antlr-tutorials
My solution that failed due to infinite recursion (but I thought the LPAREN^ and RPAREN! where supposed to solve that???)....
It would have, if that were the only expression after the WHILE. However, the orExpr is causing the problem in your case (if you remove it, that recursion error will go away).
A parenthesized expression usually has the highest precedence, and should therefor be placed in your primaryExpr rule, like this:
grammar T;
options {
output=AST;
}
parse : whereClause EOF!;
whereClause : WHERE^ expression;
expression : orExpr;
orExpr : andExpr (OR^ andExpr)*;
andExpr : primaryExpr (AND^ primaryExpr)*;
primaryExpr : bool | NUMBER | '('! expression ')'!;
bool : TRUE | FALSE;
TRUE : 'true';
FALSE : 'false';
WHERE : 'where';
LPAREN : '(';
RPAREN : ')';
OR : '||';
AND : '&&';
NUMBER : '0'..'9'+ ('.' '0'..'9'*)?;
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
Now both the input "where true || false" and "where (true || false)" will be parsed in the following AST:

Left-factoring grammar of coffeescript expressions

I'm writing an Antlr/Xtext parser for coffeescript grammar. It's at the beginning yet, I just moved a subset of the original grammar, and I am stuck with expressions. It's the dreaded "rule expression has non-LL(*) decision" error. I found some related questions here, Help with left factoring a grammar to remove left recursion and ANTLR Grammar for expressions. I also tried How to remove global backtracking from your grammar, but it just demonstrates a very simple case which I cannot use in real life. The post about ANTLR Grammar Tip: LL() and Left Factoring gave me more insights, but I still can't get a handle.
My question is how to fix the following grammar (sorry, I couldn't simplify it and still keep the error). I guess the trouble maker is the term rule, so I'd appreciate a local fix to it, rather than changing the whole thing (I'm trying to stay close to the rules of the original grammar). Pointers are also welcome to tips how to "debug" this kind of erroneous grammar in your head.
grammar CoffeeScript;
options {
output=AST;
}
tokens {
AT_SIGIL; BOOL; BOUND_FUNC_ARROW; BY; CALL_END; CALL_START; CATCH; CLASS; COLON; COLON_SLASH; COMMA; COMPARE; COMPOUND_ASSIGN; DOT; DOT_DOT; DOUBLE_COLON; ELLIPSIS; ELSE; EQUAL; EXTENDS; FINALLY; FOR; FORIN; FOROF; FUNC_ARROW; FUNC_EXIST; HERECOMMENT; IDENTIFIER; IF; INDENT; INDEX_END; INDEX_PROTO; INDEX_SOAK; INDEX_START; JS; LBRACKET; LCURLY; LEADING_WHEN; LOGIC; LOOP; LPAREN; MATH; MINUS; MINUS; MINUS_MINUS; NEW; NUMBER; OUTDENT; OWN; PARAM_END; PARAM_START; PLUS; PLUS_PLUS; POST_IF; QUESTION; QUESTION_DOT; RBRACKET; RCURLY; REGEX; RELATION; RETURN; RPAREN; SHIFT; STATEMENT; STRING; SUPER; SWITCH; TERMINATOR; THEN; THIS; THROW; TRY; UNARY; UNTIL; WHEN; WHILE;
}
COMPARE : '<' | '==' | '>';
COMPOUND_ASSIGN : '+=' | '-=';
EQUAL : '=';
LOGIC : '&&' | '||';
LPAREN : '(';
MATH : '*' | '/';
MINUS : '-';
MINUS_MINUS : '--';
NEW : 'new';
NUMBER : ('0'..'9')+;
PLUS : '+';
PLUS_PLUS : '++';
QUESTION : '?';
RELATION : 'in' | 'of' | 'instanceof';
RPAREN : ')';
SHIFT : '<<' | '>>';
STRING : '"' (('a'..'z') | ' ')* '"';
TERMINATOR : '\n';
UNARY : '!' | '~' | NEW;
// Put it at the end, so keywords will be matched earlier
IDENTIFIER : ('a'..'z' | 'A'..'Z')+;
WS : (' ')+ {skip();} ;
root
: body
;
body
: line
;
line
: expression
;
assign
: assignable EQUAL expression
;
expression
: value
| assign
| operation
;
identifier
: IDENTIFIER
;
simpleAssignable
: identifier
;
assignable
: simpleAssignable
;
value
: assignable
| literal
| parenthetical
;
literal
: alphaNumeric
;
alphaNumeric
: NUMBER
| STRING;
parenthetical
: LPAREN body RPAREN
;
// term should be the same as expression except operation to avoid left-recursion
term
: value
| assign
;
questionOp
: term QUESTION?
;
mathOp
: questionOp (MATH questionOp)*
;
additiveOp
: mathOp ((PLUS | MINUS) mathOp)*
;
shiftOp
: additiveOp (SHIFT additiveOp)*
;
relationOp
: shiftOp (RELATION shiftOp)*
;
compareOp
: relationOp (COMPARE relationOp)*
;
logicOp
: compareOp (LOGIC compareOp)*
;
operation
: UNARY expression
| MINUS expression
| PLUS expression
| MINUS_MINUS simpleAssignable
| PLUS_PLUS simpleAssignable
| simpleAssignable PLUS_PLUS
| simpleAssignable MINUS_MINUS
| simpleAssignable COMPOUND_ASSIGN expression
| logicOp
;
UPDATE:
The final solution will use Xtext with an external lexer to avoid to intricacies of handling significant whitespace. Here is a snippet from my Xtext version:
CompareOp returns Operation:
AdditiveOp ({CompareOp.left=current} operator=COMPARE right=AdditiveOp)*;
My strategy is to make a working Antlr parser first without a usable AST. (Well, it would deserve a separates question if this is a feasible approach.) So I don't care about tokens at the moment, they are included to make development easier.
I am aware that the original grammar is LR. I don't know how close I can stay to it when transforming to LL.
UPDATE2 and SOLUTION:
I could simplify my problem with the insights gained from Bart's answer. Here is a working toy grammar to handle simple expressions with function calls to illustrate it. The comment before expression shows my insight.
grammar FunExp;
ID: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
NUMBER: '0'..'9'+;
WS: (' ')+ {skip();};
root
: expression
;
// atom and functionCall would go here,
// but they are reachable via operation -> term
// so they are omitted here
expression
: operation
;
atom
: NUMBER
| ID
;
functionCall
: ID '(' expression (',' expression)* ')'
;
operation
: multiOp
;
multiOp
: additiveOp (('*' | '/') additiveOp)*
;
additiveOp
: term (('+' | '-') term)*
;
term
: atom
| functionCall
| '(' expression ')'
;
When you generate a lexer and parser from your grammar, you see the following error printed to your console:
error(211): CoffeeScript.g:52:3: [fatal] rule expression has non-LL(*) decision due to recursive rule invocations reachable from alts 1,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
warning(200): CoffeeScript.g:52:3: Decision can match input such as "{NUMBER, STRING}" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
(I've emphasized the important bits)
This is only the first error, but you start with the first and with a bit of luck, the errors below that first one will also disappear when you fix the first one.
The error posted above means that when you're trying to parse either a NUMBER or a STRING with the parser generated from your grammar, the parser can go two ways when it ends up in the expression rule:
expression
: value // choice 1
| assign // choice 2
| operation // choice 3
;
Namely, choice 1 and choice 3 both can parse a NUMBER or a STRING, as you can see by the "paths" the parser can follow to match these 2 choices:
choice 1:
expression
value
literal
alphaNumeric : {NUMBER, STRING}
choice 3:
expression
operation
logicOp
relationOp
shiftOp
additiveOp
mathOp
questionOp
term
value
literal
alphaNumeric : {NUMBER, STRING}
In the last part of the warning, ANTLR informs you that it ignores choice 3 whenever either a NUMBER or a STRING will be parsed, causing choice 1 to match such input (since it is defined before choice 3).
So, either the CoffeeScript grammar is ambiguous in this respect (and handles this ambiguity somehow), or your implementation of it is wrong (I'm guessing the latter :)). You need to fix this ambiguity in your grammar: i.e. don't let the expression's choices 1 and 3 both match the same input.
I noticed 3 other things in your grammar:
1
Take the following lexer rules:
NEW : 'new';
...
UNARY : '!' | '~' | NEW;
Be aware that the token UNARY can never match the text 'new' since the token NEW is defined before it. If you want to let UNARY macth this, remove the NEW rule and do:
UNARY : '!' | '~' | 'new';
2
In may occasions, you're collecting multiple types of tokens in a single one, like LOGIC:
LOGIC : '&&' | '||';
and then you use that token in a parser rules like this:
logicOp
: compareOp (LOGIC compareOp)*
;
But if you're going to evaluate such an expression at a later stage, you don't know what this LOGIC token matched ('&&' or '||') and you'll have to inspect the token's inner text to find that out. You'd better do something like this (at least, if you're doing some sort of evaluating at a later stage):
AND : '&&';
OR : '||';
...
logicOp
: compareOp ( AND compareOp // easier to evaluate, you know it's an AND expression
| OR compareOp // easier to evaluate, you know it's an OR expression
)*
;
3
You're skipping white spaces (and no tabs?) with:
WS : (' ')+ {skip();} ;
but doesn't CoffeeScript indent it's code block with spaces (and tabs) just like Python? But perhaps you're going to do that in a later stage?
I just saw that the grammar you're looking at is a jison grammar (which is more or less a bison implementation in JavaScript). But bison, and therefor jison, generates LR parsers while ANTLR generates LL parsers. So trying to stay close to the rules of the original grammar will only result in more problems.