"The following sets of rules are mutually left-recursive" - antlr

I have tried to write a grammar to recognize expressions like:
(A + MAX(B) ) / ( C - AVERAGE(A) )
IF( A > AVERAGE(A), 0, 1 )
X / (MAX(X)
Unfortunately antlr3 fails with these errors:
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
error(211): DerivedKeywords.g:110:13: [fatal] rule booleanTerm has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
error(206): DerivedKeywords.g:110:13: Alternative 1: after matching input such as decision cannot predict what comes next due to recursion overflow to additiveExpression from formula
I have spent some hours trying to fix these, it would be great if anyone could at least help me fix the first problem. Thanks
Code:
grammar DerivedKeywords;
options {
output=AST;
//backtrack=true;
}
WS : ( ' ' | '\t' | '\n' | '\r' )
{ skip(); }
;
//for numbers
DIGIT
: '0'..'9'
;
//for both integer and real number
NUMBER
: (DIGIT)+ ( '.' (DIGIT)+ )?( ('E'|'e')('+'|'-')?(DIGIT)+ )?
;
// Boolean operatos
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
EQ : '=';
NEQ : '!=';
GT : '>';
LT : '<';
GTE : '>=';
LTE : '<=';
COMMA : ',';
// Token for Functions
IF : 'IF';
MAX : 'MAX';
MIN : 'MIN';
AVERAGE : 'AVERAGE';
VARIABLE : 'A'..'Z' ('A'..'Z' | '0'..'9')*
;
// OPERATORS
LPAREN : '(' ;
RPAREN : ')' ;
DIV : '/' ;
PLUS : '+' ;
MINUS : '-' ;
STAR : '*' ;
expression : formula;
formula
: functionExpression
| additiveExpression
| LPAREN! a=formula RPAREN! // First Problem
;
additiveExpression
: a=multiplicativeExpression ( (MINUS^ | PLUS^ ) b=multiplicativeExpression )*
;
multiplicativeExpression
: a=unaryExpression ( (STAR^ | DIV^ ) b=unaryExpression )*
;
unaryExpression
: MINUS^ u=unaryExpression
| primaryExpression
;
functionExpression
: f=functionOperator LPAREN e=formula RPAREN
| IF LPAREN b=booleanExpression COMMA p=formula COMMA s=formula RPAREN
;
functionOperator :
MAX | MIN | AVERAGE;
primaryExpression
: NUMBER
// Used for scientific numbers
| DIGIT
| VARIABLE
| formula
;
// Boolean stuff
booleanExpression
: orExpression;
orExpression : a=andExpression (OR^ b=andExpression )*
;
andExpression
: a=notExpression (AND^ b=notExpression )*
;
notExpression
: NOT^ t=booleanTerm
| booleanTerm
;
booleanOperator :
GT | LT | EQ | GTE | LTE | NEQ;
booleanTerm : a=formula op=booleanOperator b=formula
| LPAREN! booleanTerm RPAREN! // Second problem
;

error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
- this means that if the parser enters unaryExpression rule, it has the possibility to match additiveExpression, primaryExpression, formula, multiplicativeExpression and unaryExpression again without ever consuming a single token from input - so it cannot decide whether to use those rules or not, because even if it uses the rules, the input will be the same.
You're probably trying to allow subexpressions in expressions by this sequence of rules - you need to make sure that path will consume the left parenthesis of the subexpression. Probably the formula alternative in primaryExpression should be changed to LPAREN formula RPAREN, and the rest of grammar be adjusted accordingly.

Related

Simple ANTLR grammar not disambiguating identifiers and literals

I'm trying to match this syntax:
P = 100
require P
credit account:subaccount P
The first is an assignment. The second is a "check" that P is truthy. The third is an instruction to move 100 to account:subaccount. The problem is that the grammar I've written thinks the third line is just an assignment with a missing equal sign. I can't see why.
program: (stmt NEWLINE)+;
stmt: require | entry;
require: 'require' filtrex;
entry: (CREDIT | DEBIT) JOURNAL filtrex (IF filtrex)? (LPARENHASH EXTID RPAREN)?;
assign: ID EQ filtrex;
filtrex: math;
math
: math (TIMES | DIV) math
| math (PLUS | MINUS) math
| LPAREN math RPAREN
| (PLUS | MINUS)* atom
;
atom: NUMBER
| ID
;
NUMBER
: ('0' .. '9') + ('.' ('0' .. '9') +)?
;
fragment SIGN
: ('+' | '-')
;
ID: [a-zA-Z]+[0-9a-zA-Z]*;
EQ: '=';
JOURNAL: [a-zA-Z:]+;
EXTID: [a-zA-Z0-9-]+;
COLON: ':';
CREDIT: 'credit';
DEBIT: 'debit';
IF: 'if';
NEWLINE : [\r\n];
NUM : [0-9.]+;
LPAREN: '(';
RPAREN: ')';
LPARENHASH: '(#';
PLUS: '+';
MINUS: '-';
TIMES: '*';
DIV: '/' ;
POINT: '.';
WS: [ \r\n\t] + -> skip;
UPDATE
Thanks to the suggestions below I have something that seems to work properly. Now to the implementation of the logic...
grammar Txl;
// High level language
program: stmt (NEWLINE stmt)* NEWLINE? EOF;
stmt: require | entry | assignment;
require: 'require' expr;
entry: (CREDIT | DEBIT) journal expr (IF expr)? (LPAREN 'id:' EXTID RPAREN)?;
assignment: IDENT ASSIGN expr;
journal: IDENT COLON IDENT;
expr: expr MULT expr
| expr DIV expr
| expr PLUS expr
| expr MINUS expr
| expr MOD expr
| expr POW expr
| MINUS expr
| expr AND expr
| expr OR expr
| NOT expr
| expr EQ expr
| expr NEQ expr
| expr LTE expr
| expr LT expr
| expr GTE expr
| expr GT expr
| expr QUESTION expr COLON expr
| LPAREN expr RPAREN
| NUMBER
| IDENT LPAREN args RPAREN
| IDENT
;
fnArg: expr | journal;
args: fnArg
| fnArg COMMA fnArg
|
;
// Reserved words
CREDIT: 'credit';
DEBIT: 'debit';
IF: 'if';
REQUIRE: 'require';
// Operators
MULT: '*';
DIV: '/';
MINUS: '-';
PLUS: '+';
POW: '^';
MOD: '%';
LPAREN: '(';
RPAREN: ')';
LBRACE: '[';
RBRACE: ']';
COMMA: ',';
EQ: '==';
NEQ: '!=';
GTE: '>=';
LTE: '<=';
GT: '>';
LT: '<';
ASSIGN: '=';
QUESTION: '?';
COLON: ':';
AND: 'and';
OR: 'or';
NOT: 'not';
HASH: '#';
NEWLINE : [\r\n];
WS: [ \t] + -> skip;
// Entities
NUMBER: ('0' .. '9') + ('.' ('0' .. '9') +)?;
IDENT: [a-zA-Z]+[0-9a-zA-Z]*;
EXTID: [a-zA-Z0-9-]+;
That is because the input credit is not being matched by your CREDIT rule, but by the ID rule. The lexer always tries to match as many characters as possible. So, the input credit can be matched by: ID, JOURNAL, EXTID and CREDIT. Whenever it happens that multiple rules can match the same characters, the one defined first "wins" (ID in this case). The lexer does not "listen" to what the parser is trying to match, it operates independently from the parser.
Note that the EXTID also causes the input - to be matched by it, causing the MINUS rule to never be matched.
The solution: place your keywords before the ID rule inside the grammar:
CREDIT : 'credit';
DEBIT : 'debit';
REQUIRE : 'require';
ID : [a-zA-Z]+ [0-9a-zA-Z]*;
And, if possible, I'd also remove the JOURNAL and EXTID lexer rules and try to "promote" them to parser rules:
journal
: ID COLON ID
;
extid
: ID (MINUS ID)*
;
NUMBER and NUM can also match the same, while NUM also matches input like 1........2.......22222.... I'd remove the NUM rule and only keep NUMBER.
Remove the \r\n part from WS: [ \r\n\t] + -> skip; since these are already matched by your NEWLINE rule.
By doing (stmt NEWLINE)+, every stmt must end with a new line (also the last one). This could be a better solution: stmt (NEWLINE stmt)* NEWLINE?.
The grammar could look like this:
program
: stmt (NEWLINE stmt)* NEWLINE? EOF
;
stmt
: require
| entry
| assign
;
require
: REQUIRE filtrex
;
entry
: (CREDIT | DEBIT) journal filtrex (IF filtrex)? (LPARENHASH extid RPAREN)?
;
assign
: ID EQ filtrex
;
journal
: ID COLON ID
;
extid
: ID (MINUS ID)*
;
filtrex
: math
;
math
: math (TIMES | DIV) math
| math (PLUS | MINUS) math
| LPAREN math RPAREN
| (PLUS | MINUS)* atom
;
atom
: NUMBER
| ID
;
NUMBER : [0-9]+ ('.' [0-9]+)?;
CREDIT : 'credit';
DEBIT : 'debit';
REQUIRE : 'require';
IF : 'if';
ID : [a-zA-Z]+ [0-9a-zA-Z]*;
EQ : '=';
COLON : ':';
NEWLINE : [\r\n];
LPAREN : '(';
RPAREN : ')';
LPARENHASH : '(#';
PLUS : '+';
MINUS : '-';
TIMES : '*';
DIV : '/' ;
POINT : '.';
WS : [ \t] + -> skip;
which will parse your example input like this:

Why parse failing after upgrading from Antlr 3 to Antlr 4?

Recently I am trying to upgrade my project from Antlr3 to Antlr4. But after making change in the grammar file, it seems the equations that worked previously is no longer working. I am new to Antlr4 so unable to understand whether my change broke something or not.
Here is my original grammar file:
grammar equation;
options {
language=CSharp2;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
VARIABLE;
CONSTANT;
EXPR;
PAREXPR;
EQUATION;
UNARYEXPR;
FUNCTION;
BINARYOP;
LIST;
}
equationset: equation* EOF!;
equation: variable ASSIGN expression -> ^(EQUATION variable expression)
;
parExpression
: LPAREN expression RPAREN -> ^(PAREXPR expression)
;
expression
: conditionalexpression -> ^(EXPR conditionalexpression)
;
conditionalexpression
: orExpression
;
orExpression
: andExpression ( OR^ andExpression )*
;
andExpression
: comparisonExpression ( AND^ comparisonExpression )*;
comparisonExpression:
additiveExpression ((EQ^ | NE^ | LTE^ | GTE^ | LT^ | GT^) additiveExpression)*;
additiveExpression
: multiplicativeExpression ( (PLUS^ | MINUS^) multiplicativeExpression )*
;
multiplicativeExpression
: unaryExpression ( ( TIMES^ | DIVIDE^) unaryExpression )*
;
unaryExpression
: NOT unaryExpression -> ^(UNARYEXPR NOT unaryExpression)
| MINUS unaryExpression -> ^(UNARYEXPR MINUS unaryExpression)
| exponentexpression;
exponentexpression
: primary (CARET^ primary)*;
primary : parExpression | constant | booleantok | variable | function;
numeric: INTEGER | REAL;
constant: STRING -> ^(CONSTANT STRING) | numeric -> ^(CONSTANT numeric);
booleantok : BOOLEAN -> ^(BOOLEAN);
scopedidentifier
: (IDENTIFIER DOT)* IDENTIFIER -> IDENTIFIER+;
function
: scopedidentifier LPAREN argumentlist RPAREN -> ^(FUNCTION scopedidentifier argumentlist);
variable: scopedidentifier -> ^(VARIABLE scopedidentifier);
argumentlist: (expression) ? (COMMA! expression)*;
WS : (' '|'\r'|'\n'|'\t')+ {$channel=HIDDEN;};
COMMENT : '/*' .* '*/' {$channel=HIDDEN;};
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;};
STRING: (('\"') ( (~('\"')) )* ('\"'))+;
fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;
EQ : '==';
ASSIGN : '=';
NE : '!=' | '<>';
OR : 'or' | '||';
AND : 'and' | '&&';
NOT : '!'|'not';
LTE : '<=';
GTE : '>=';
LT : '<';
GT : '>';
TIMES : '*';
DIVIDE : '/';
BOOLEAN : 'true' | 'false';
IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;
REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;
PLUS : '+';
MINUS : '-';
COMMA : ',';
RPAREN : ')';
LPAREN : '(';
DOT : '.';
CARET : '^';
And here is what I have after my changes:
grammar equation;
options {
}
tokens {
VARIABLE;
CONSTANT;
EXPR;
PAREXPR;
EQUATION;
UNARYEXPR;
FUNCTION;
BINARYOP;
LIST;
}
equationset: equation* EOF;
equation: variable ASSIGN expression
;
parExpression
: LPAREN expression RPAREN
;
expression
: conditionalexpression
;
conditionalexpression
: orExpression
;
orExpression
: andExpression ( OR andExpression )*
;
andExpression
: comparisonExpression ( AND comparisonExpression )*;
comparisonExpression:
additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;
additiveExpression
: multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
;
multiplicativeExpression
: unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
;
unaryExpression
: NOT unaryExpression
| MINUS unaryExpression
| exponentexpression;
exponentexpression
: primary (CARET primary)*;
primary : parExpression | constant | booleantok | variable | function;
numeric: INTEGER | REAL;
constant: STRING | numeric;
booleantok : BOOLEAN;
scopedidentifier
: (IDENTIFIER DOT)* IDENTIFIER;
function
: scopedidentifier LPAREN argumentlist RPAREN;
variable: scopedidentifier;
argumentlist: (expression) ? (COMMA expression)*;
WS : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);
COMMENT : '/*' .* '*/' ->channel(HIDDEN);
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);
STRING: (('\"') ( (~('\"')) )* ('\"'))+;
fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;
EQ : '==';
ASSIGN : '=';
NE : '!=' | '<>';
OR : 'or' | '||';
AND : 'and' | '&&';
NOT : '!'|'not';
LTE : '<=';
GTE : '>=';
LT : '<';
GT : '>';
TIMES : '*';
DIVIDE : '/';
BOOLEAN : 'true' | 'false';
IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;
REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;
PLUS : '+';
MINUS : '-';
COMMA : ',';
RPAREN : ')';
LPAREN : '(';
DOT : '.';
CARET : '^';
A sample equation that I am trying to parse (which was working OK before) is:
[a].[b] = 1.76 * [Product_DC].[PDC_Inbound_Pallets] * if(product_dc.[PDC_DC] =="US84",1,0)
Thanks in advance.
Tokens should be listed with comma , not semicolon ;. See also Token Section paragraph in official doc.
Since ANTLR 4.7 backslash is not required for double quote escaping. STRING: (('\"') ( (~('\"')) )* ('\"'))+; should be rewritten to STRING: ('"' ~'"'* '"')+;.
You missed question mark in multiline comment token for non-greedy matching: '/*' .* '*/' -> '/*' .*? '*/'.
So, the fixed grammar looks like this:
grammar equation;
options {
}
tokens {
VARIABLE,
CONSTANT,
EXPR,
PAREXPR,
EQUATION,
UNARYEXPR,
FUNCTION,
BINARYOP,
LIST
}
equationset: equation* EOF;
equation: variable ASSIGN expression
;
parExpression
: LPAREN expression RPAREN
;
expression
: conditionalexpression
;
conditionalexpression
: orExpression
;
orExpression
: andExpression ( OR andExpression )*
;
andExpression
: comparisonExpression ( AND comparisonExpression )*;
comparisonExpression:
additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;
additiveExpression
: multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
;
multiplicativeExpression
: unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
;
unaryExpression
: NOT unaryExpression
| MINUS unaryExpression
| exponentexpression;
exponentexpression
: primary (CARET primary)*;
primary : parExpression | constant | booleantok | variable | function;
numeric: INTEGER | REAL;
constant: STRING | numeric;
booleantok : BOOLEAN;
scopedidentifier
: (IDENTIFIER DOT)* IDENTIFIER;
function
: scopedidentifier LPAREN argumentlist RPAREN;
variable: scopedidentifier;
argumentlist: (expression) ? (COMMA expression)*;
WS : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);
COMMENT : '/*' .*? '*/' -> channel(HIDDEN);
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);
STRING: ('"' ~'"'* '"')+;
fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;
EQ : '==';
ASSIGN : '=';
NE : '!=' | '<>';
OR : 'or' | '||';
AND : 'and' | '&&';
NOT : '!'|'not';
LTE : '<=';
GTE : '>=';
LT : '<';
GT : '>';
TIMES : '*';
DIVIDE : '/';
BOOLEAN : 'true' | 'false';
IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;
REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;
PLUS : '+';
MINUS : '-';
COMMA : ',';
RPAREN : ')';
LPAREN : '(';
DOT : '.';
CARET : '^';

ParserRule matching the wrong token

I'm trying to learn a bit ANTLR4 and define a grammar for some 4GL language.
This is what I've got:
compileUnit
:
typedeclaration EOF
;
typedeclaration
:
ID LPAREN DATATYPE INT RPAREN
;
DATATYPE
:
DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC
:
'A'
;
DATATYPE_NUMERIC
:
'N'
;
fragment
DIGIT
:
[0-9]
;
fragment
LETTER
:
[a-zA-Z]
;
INT
:
DIGIT+
;
ID
:
LETTER
(
LETTER
| DIGIT
)*
;
LPAREN
:
'('
;
RPAREN
:
')'
;
WS
:
[ \t\f]+ -> skip
;
What I want to be able to parse:
TEST (A10)
what I get:
typedeclaration:1:6: mismatched input 'A10' expecting DATATYPE
I am however able to write:
TEST (A 10)
Why do I need to put a whitespace in here? The LPAREN DATATYPE in itself is working, so there is no need for a space inbetween. Also the INT RPAREN is working.
Why is a space needed between DATATYPE and INT? I'm a bit confused on that one.
I guess that it's matching ID because it's the "longest" match, but there must be some way to force to be lazier here, right?
You should ignore 'A' and 'N' chats at first position of ID. As #CoronA noticed ANTLR matches token as long as possible (length of ID 'A10' more than length of DATATYPE_ALPHANUMERIC 'A'). Also read this: Priority rules. Try to use the following grammar:
grammar expr;
compileUnit
: typedeclaration EOF
;
typedeclaration
: ID LPAREN datatype INT RPAREN
;
datatype
: DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC
: 'A'
;
DATATYPE_NUMERIC
: 'N'
;
INT
: DIGIT+
;
ID
: [b-mo-zB-MO-Z] (LETTER | DIGIT)*
;
LPAREN
: '('
;
RPAREN
: ')'
;
WS
: [ \t\f]+ -> skip
;
fragment
DIGIT
: [0-9]
;
fragment
LETTER
: [a-zA-Z]
;
Also you can use the following grammar without id restriction. Data types will be recognized earlier than letters. it's not clear too:
grammar expr;
compileUnit
: typedeclaration EOF
;
typedeclaration
: id LPAREN datatype DIGIT+ RPAREN
;
id
: (datatype | LETTER) (datatype | LETTER | DIGIT)*
;
datatype
: DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC: 'A';
DATATYPE_NUMERIC: 'N';
// List with another Data types.
LETTER: [a-zA-Z];
LPAREN
: '('
;
RPAREN
: ')'
;
WS
: [ \t\f]+ -> skip
;
DIGIT
: [0-9]
;

Trying to resolve left-recursion trying to build Parser with ANTLR

I’m currently trying to build a parser for the language Oberon using Antlr and Ecplise.
This is what I have got so far:
grammar oberon;
options
{
language = Java;
//backtrack = true;
output = AST;
}
#parser::header {package dhbw.Oberon;}
#lexer::header {package dhbw.Oberon; }
T_ARRAY : 'ARRAY' ;
T_BEGIN : 'BEGIN';
T_CASE : 'CASE' ;
T_CONST : 'CONST' ;
T_DO : 'DO' ;
T_ELSE : 'ELSE' ;
T_ELSIF : 'ELSIF' ;
T_END : 'END' ;
T_EXIT : 'EXIT' ;
T_IF : 'IF' ;
T_IMPORT : 'IMPORT' ;
T_LOOP : 'LOOP' ;
T_MODULE : 'MODULE' ;
T_NIL : 'NIL' ;
T_OF : 'OF' ;
T_POINTER : 'POINTER' ;
T_PROCEDURE : 'PROCEDURE' ;
T_RECORD : 'RECORD' ;
T_REPEAT : 'REPEAT' ;
T_RETURN : 'RETURN';
T_THEN : 'THEN' ;
T_TO : 'TO' ;
T_TYPE : 'TYPE' ;
T_UNTIL : 'UNTIL' ;
T_VAR : 'VAR' ;
T_WHILE : 'WHILE' ;
T_WITH : 'WITH' ;
module : T_MODULE ID SEMI importlist? declarationsequence?
(T_BEGIN statementsequence)? T_END ID PERIOD ;
importlist : T_IMPORT importitem (COMMA importitem)* SEMI ;
importitem : ID (ASSIGN ID)? ;
declarationsequence :
( T_CONST (constantdeclaration SEMI)*
| T_TYPE (typedeclaration SEMI)*
| T_VAR (variabledeclaration SEMI)*)
(proceduredeclaration SEMI | forwarddeclaration SEMI)*
;
constantdeclaration: identifierdef EQUAL expression ;
identifierdef: ID MULT? ;
expression: simpleexpression (relation simpleexpression)? ;
simpleexpression : (PLUS|MINUS)? term (addoperator term)* ;
term: factor (muloperator factor)* ;
factor: number
| stringliteral
| T_NIL
| set
| designator '(' explist? ')'
;
number: INT | HEX ; // TODO add real
stringliteral : '"' ( ~('\\'|'"') )* '"' ;
set: '{' elementlist? '}' ;
elementlist: element (COMMA element)* ;
element: expression (RANGESEP expression)? ;
designator: qualidentifier
('.' ID
| '[' explist ']'
| '(' qualidentifier ')'
| UPCHAR )+
;
explist: expression (COMMA expression)* ;
actualparameters: '(' explist? ')' ;
muloperator: MULT | DIV | MOD | ET ;
addoperator: PLUS | MINUS | OR ;
relation: EQUAL ; // TODO
typedeclaration: ID EQUAL type ;
type: qualidentifier
| arraytype
| recordtype
| pointertype
| proceduretype
;
qualidentifier: (ID '.')* ID ;
arraytype: T_ARRAY expression (',' expression) T_OF type;
recordtype: T_RECORD ('(' qualidentifier ')')? fieldlistsequence T_END ;
fieldlistsequence: fieldlist (SEMI fieldlist) ;
fieldlist: (identifierlist COLON type)? ;
identifierlist: identifierdef (COMMA identifierdef)* ;
pointertype: T_POINTER T_TO type ;
proceduretype: T_PROCEDURE formalparameters? ;
variabledeclaration: identifierlist COLON type ;
proceduredeclaration: procedureheading SEMI procedurebody ID ;
procedureheading: T_PROCEDURE MULT? identifierdef formalparameters? ;
formalparameters: '(' params? ')' (COLON qualidentifier)? ;
params: fpsection (SEMI fpsection)* ;
fpsection: T_VAR? idlist COLON formaltype ;
idlist: ID (COMMA ID)* ;
formaltype: (T_ARRAY T_OF)* (qualidentifier | proceduretype);
procedurebody: declarationsequence (T_BEGIN statementsequence)? T_END ;
forwarddeclaration: T_PROCEDURE UPCHAR? ID MULT? formalparameters? ;
statementsequence: statement (SEMI statement)* ;
statement : assignment
| procedurecall
| ifstatement
| casestatement
| whilestatement
| repeatstatement
| loopstatement
| withstatement
| T_EXIT
| T_RETURN expression?
;
assignment: designator ASSIGN expression ;
procedurecall: designator actualparameters? ;
ifstatement: T_IF expression T_THEN statementsequence
(T_ELSIF expression T_THEN statementsequence)*
(T_ELSE statementsequence)? T_END ;
casestatement: T_CASE expression T_OF caseitem ('|' caseitem)*
(T_ELSE statementsequence)? T_END ;
caseitem: caselabellist COLON statementsequence ;
caselabellist: caselabels (COMMA caselabels)* ;
caselabels: expression (RANGESEP expression)? ;
whilestatement: T_WHILE expression T_DO statementsequence T_END ;
repeatstatement: T_REPEAT statementsequence T_UNTIL expression ;
loopstatement: T_LOOP statementsequence T_END ;
withstatement: T_WITH qualidentifier COLON qualidentifier T_DO statementsequence T_END ;
ID : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'_'|'0'..'9')* ;
fragment DIGIT : '0'..'9' ;
INT : ('-')?DIGIT+ ;
fragment HEXDIGIT : '0'..'9'|'A'..'F' ;
HEX : HEXDIGIT+ 'H' ;
ASSIGN : ':=' ;
COLON : ':' ;
COMMA : ',' ;
DIV : '/' ;
EQUAL : '=' ;
ET : '&' ;
MINUS : '-' ;
MOD : '%' ;
MULT : '*' ;
OR : '|' ;
PERIOD : '.' ;
PLUS : '+' ;
RANGESEP : '..' ;
SEMI : ';' ;
UPCHAR : '^' ;
WS : ( ' ' | '\t' | '\r' | '\n'){skip();};
My problem is when I check the grammar I get the following error and just can’t find an appropriate way to fix this:
rule statement has non-LL(*) decision
due to recursive rule invocations reachable from alts 1,2.
Resolve by left-factoring or using syntactic predicates
or using backtrack=true option.
|---> statement : assignment
Also I have the problem with declarationsequence and simpleexpression.
When I use options { … backtrack = true; … } it at least compiles, but obviously doesn’t work right anymore when I run a test-file, but I can’t find a way to resolve the left-recursion on my own (or maybe I’m just too blind at the moment because I’ve looked at this for far too long now). Any ideas how I could change the lines where the errors occurs to make it work?
EDIT
I could fix one of the three mistakes. statement works now. The problem was that assignment and procedurecall both started with designator.
statement : procedureassignmentcall
| ifstatement
| casestatement
| whilestatement
| repeatstatement
| loopstatement
| withstatement
| T_EXIT
| T_RETURN expression?
;
procedureassignmentcall : (designator ASSIGN)=> assignment | procedurecall;
assignment: designator ASSIGN expression ;
procedurecall: designator actualparameters? ;

ANTLR grammar error

I'm trying to built C-- compiler using ANTLR 3.4.
Full set of the grammar listed here,
program : (vardeclaration | fundeclaration)* ;
vardeclaration : INT ID (OPENSQ NUM CLOSESQ)? SEMICOL ;
fundeclaration : typespecifier ID OPENP params CLOSEP compoundstmt ;
typespecifier : INT | VOID ;
params : VOID | paramlist ;
paramlist : param (COMMA param)* ;
param : INT ID (OPENSQ CLOSESQ)? ;
compoundstmt : OPENCUR vardeclaration* statement* CLOSECUR ;
statementlist : statement* ;
statement : expressionstmt | compoundstmt | selectionstmt | iterationstmt | returnstmt;
expressionstmt : (expression)? SEMICOL;
selectionstmt : IF OPENP expression CLOSEP statement (options {greedy=true;}: ELSE statement)?;
iterationstmt : WHILE OPENP expression CLOSEP statement;
returnstmt : RETURN (expression)? SEMICOL;
expression : (var EQUAL expression) | sampleexpression;
var : ID ( OPENSQ expression CLOSESQ )? ;
sampleexpression: addexpr ( ( LOREQ | LESS | GRTR | GOREQ | EQUAL | NTEQL) addexpr)?;
addexpr : mulexpr ( ( PLUS | MINUS ) mulexpr)*;
mulexpr : factor ( ( MULTI | DIV ) factor )*;
factor : ( OPENP expression CLOSEP ) | var | call | NUM;
call : ID OPENP arglist? CLOSEP;
arglist : expression ( COMMA expression)*;
Used lexer rules as following,
ELSE : 'else' ;
IF : 'if' ;
INT : 'int' ;
RETURN : 'return' ;
VOID : 'void' ;
WHILE : 'while' ;
PLUS : '+' ;
MINUS : '-' ;
MULTI : '*' ;
DIV : '/' ;
LESS : '<' ;
LOREQ : '<=' ;
GRTR : '>' ;
GOREQ : '>=' ;
EQUAL : '==' ;
NTEQL : '!=' ;
ASSIGN : '=' ;
SEMICOL : ';' ;
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
OPENSQ : '[' ;
CLOSESQ : ']' ;
OPENCUR : '{' ;
CLOSECUR: '}' ;
SCOMMENT: '/*' ;
ECOMMENT: '*/' ;
ID : ('a'..'z' | 'A'..'Z')+/*(' ')*/ ;
NUM : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;};
COMMENT: '/*' .* '*/' {$channel = HIDDEN;};
But I try to save this it give me the error,
error(211): /CMinusMinus/src/CMinusMinus/CMinusMinus.g:33:13: [fatal] rule expression has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
|---> expression : (var EQUAL expression) | sampleexpression;
1 error
How can I resolve this problem?
As already mentioned: your grammar rule expression is ambiguous: both alternatives in that rule start, or can be, a var.
You need to "help" your parser a bit. If the parse can see a var followed by an EQUAL, it should choose alternative 1, else alternative 2. This can be done by using a syntactic predicate (the (var EQUAL)=> part in the rule below).
expression
: (var EQUAL)=> var EQUAL expression
| sampleexpression
;
More about predicates in this Q&A: What is a 'semantic predicate' in ANTLR?
The problem is this:
expression : (var EQUAL expression) | sampleexpression;
where you either start with var or sampleexpression. But sampleexpression can be reduced to var as well by doing sampleexpression->addExpr->MultExpr->Factor->var
So there is no way to find a k-length predicate for the compiler.
You can as suggested by the error message set backtrack=true to see whether this solves your problem, but it might lead not to the AST - parsetrees you would expect and might also be slow on special input conditions.
You could also try to refactor your grammar to avoid such recursions.