How to resolve "The following alternatives can never be matched" - antlr

I have been struggling to resolve a "multiple alternatives" error in my parser for a couple of days now but with no success. I have been converting Bart Kiers excellent Tiny Language(TL) tutorial code to C# using Sam Harwell's port of ANTLR3 and VS2010. Kudos to both these guys for their excellent work. I believe I have followed Bart's tutorial accurately but as I am a newbie with ANTLR I can't be sure.
I did have the TL code working nicely on a pure math basis i.e. no "functions" or "if then else" or "while" (see screenshot of a little app)
but when I added the code for the missing pieces to complete the tutorial I get a parsing error in "functionCall" and in "list" (see the code below)
grammar Paralex2;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BLOCK;
RETURN;
STATEMENTS;
ASSIGNMENT;
FUNC_CALL;
EXP;
EXP_LIST;
ID_LIST;
IF;
TERNARY;
U_SUB;
NEGATE;
FUNCTION;
INDEXES;
LIST;
LOOKUP;
}
#lexer::namespace{Paralex2}
#parser::namespace{Paralex2}
/*
* Parser Rules
*/
#parser::header {using System; using System.Collections.Generic;}
#parser::members{
public SortedList<string, Function> functions = new SortedList<string, Function>();
private void defineFunction(string id, Object idList, Object block) {
// `idList` is possibly null! Create an empty tree in that case.
CommonTree idListTree = idList == null ? new CommonTree() : (CommonTree)idList;
// `block` is never null.
CommonTree blockTree = (CommonTree)block;
// The function name with the number of parameters after it the unique key
string key = id + idListTree.Children.Count();
functions.Add(key, new Function(id, idListTree, blockTree));
}
}
public parse
: block EOF -> block
;
block
: (statement | functionDecl)* (Return exp ';')? -> ^(BLOCK ^(STATEMENTS statement*) ^(RETURN exp?))
;
statement
: assignment ';' -> assignment
| functionCall ';' -> functionCall
| ifStatement
| forStatement
| whileStatement
;
assignment
: Identifier indexes? '=' exp
-> ^(ASSIGNMENT Identifier indexes? exp)
;
functionCall
: Identifier '(' expList? ')' -> ^(FUNC_CALL Identifier expList?)
| Assert '(' exp ')' -> ^(FUNC_CALL Assert exp)
| Size '(' exp ')' -> ^(FUNC_CALL Size exp)
;
ifStatement
: ifStat elseIfStat* elseStat? End -> ^(IF ifStat elseIfStat* elseStat?)
;
ifStat
: If exp Do block -> ^(EXP exp block)
;
elseIfStat
: Else If exp Do block -> ^(EXP exp block)
;
elseStat
: Else Do block -> ^(EXP block)
;
functionDecl
: Def Identifier '(' idList? ')' block End
{defineFunction($Identifier.text, $idList.tree, $block.tree);}
;
forStatement
: For Identifier '=' exp To exp Do block End
-> ^(For Identifier exp exp block)
;
whileStatement
: While exp Do block End -> ^(While exp block)
;
idList
: Identifier (',' Identifier)* -> ^(ID_LIST Identifier+)
;
expList
: exp (',' exp)* -> ^(EXP_LIST exp+)
;
exp
: condExp
;
condExp
: (orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
orExp
: andExp ('||'^ andExp)*
;
andExp
: equExp ('&&'^ equExp)*
;
equExp
: relExp (('==' | '!=')^ relExp)*
;
relExp
: addExp (('>=' | '<=' | '>' | '<')^ addExp)*
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: powExp ((Mul | Div)^ powExp)*
;
powExp
: unaryExp ('^'^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| '!' atom -> ^(NEGATE atom)
| atom
;
atom
: Nmber
| Bool
| Null
| lookup
;
list
: '[' expList? ']' -> ^(LIST expList?)
;
lookup
: list indexes? -> ^(LOOKUP list indexes?)
| functionCall indexes? -> ^(LOOKUP functionCall indexes?)
| Identifier indexes? -> ^(LOOKUP Identifier indexes?)
| String indexes? -> ^(LOOKUP String indexes?)
| '(' exp ')' indexes? -> ^(LOOKUP exp indexes?)
;
indexes
: ('[' exp ']')+ -> ^(INDEXES exp+)
;
/*
* Lexer Rules
*/
Assert : 'assert';
Size : 'size';
Def : 'def';
If : 'if';
Else : 'else';
Return : 'return';
For : 'for';
While : 'while';
To : 'to';
Do : 'do';
End : 'end';
In : 'in';
Null : 'null';
Or : '||';
And : '&&';
Equals : '==';
NEquals : '!=';
GTEquals : '>=';
LTEquals : '<=';
Pow : '^';
GT : '>';
LT : '<';
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Modulus : '%';
OBrace : '{';
CBrace : '}';
OBracket : '[';
CBracket : ']';
OParen : '(';
CParen : ')';
SColon : ';';
Assign : '=';
Comma : ',';
QMark : '?';
Colon : ':';
Bool
: 'true'
| 'false'
;
Nmber
: Int ('.' Digit*)?
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {Skip();}
| '/*' .* '*/' {Skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
The error messages I get are
Decision can match input such as "CParen" using multiple alternatives: 1, 2 : Line 79:20
and
Decision can match input such as "CBracket" using multiple alternatives: 1, 2 : Line 176:10
The errors relate to the functionCall and list rules. I have examined the parser file in ANTLRWorks 1.5 and confirmed the same errors there. The syntax diagrams for the two rules look like this;
and this;
I have tried several changes to try to solve the problem but I don't seem to be able to get the syntax right. I would appreciate any help you guys could provide and can email the images if that would help.
Thanks in advance
Ian Carson

You have an OR-operator too many in the condExp rule making the grammar ambiguous.
You have:
condExp
: ( orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
But it should be:
condExp
: ( orExp -> orExp)
( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:

Related

Antlr4 ignoring newlines at all but one point

I am writing a parser for a scripting language, and using antlr 4.5.3 for the purpose.
grammar VSE;
chunk
: block* EOF
;
block
: var '=' exp
| functioncall
;
var
: NAME
| var '[' exp ']'
| var '.' var
;
exp
: number
| string
| var
| functioncall
| <assoc=right> exp exp //concat
;
functioncall
: NAME '(' (exp)? (',' exp)* ')'
| var '.' functioncall
;
string
: '"' (~('"' | '\\' | '\r' | '\n') | '\\' ('"' | '\\'))* '"'
;
NAME
: [a-zA-Z_][a-zA-Z_0-9]*
;
number
: INT | HEX | FLOAT
;
INT
: Digit+
;
HEX
: '0' [xX] [0-9a-fA-F]+
;
FLOAT
: Digit* '.' Digit+
;
Digit
: [0-9]
;
WS
: [ \t\u000C\r\n]+ -> skip
;
However, while testing it, I found a variable assignment like var = something followed by some function call in next line leads to a concat statement. (My concat statement is a variable followed by another like var = var1 var2) I understand that antlr is skipping ALL the new lines in favor of line continuation, but I'd like to add the condition that if there is a new line between two exps, it would treat them as two separate blocks instead of a concat statement. i.e.
var = var2
functioncall(var)
These should be two separate blocks instead of concat statement.
Is there any way to do this?
Does the following rule suitable for you?
block
: var '=' exp NEW_LINE
| functioncall NEW_LINE
;
NEW_LINE: '\r'? '\n'
WS
: [ \t]+ -> skip
;
In another case you should use Semantic Predicates or very unclear grammar.

My Antlr4 Grammar doesn't understand math expressions

i have this Grammar
grammar Arith;
exp : LPAREN exp RPAREN
| fun
| num
| exp (OP exp)+
;
num : LPAREN num RPAREN
| LESS num
| INT
| INT 'b'
| '0x' INT
;
fun : LPAREN fun RPAREN
| LESS fun
| FUN_TXT LPAREN exp RPAREN
| 'pow' LPAREN exp ',' exp RPAREN
;
INT : ('0'..'9')+ ;
LPAREN : '(' ;
RPAREN : ')' ;
FUN_TXT : 'log' | 'acos' | 'asin' | 'atan' | 'cos' | 'abs' | 'sin' | 'sqrt' | 'tan' ;
OP : ADD | LESS | MUL | DIV | MOD ;
ADD : '+' ;
LESS : '-' ;
MUL : '*' ;
DIV: '/' ;
MOD: '%' ;
WS : [ \t\r\n] -> skip ;
I try to insert sin(-1) but the lexer said me "no viable alternative at input '-'".
I think that program translate it in "exp -> exp (OP exp)+" instead of "exp -> fun(num) -> fun(LESS num)"
Could someone help me to understand what i've forget and how change my rules in the right way?
Thanks
First I would simplify your rules for num and fun
num : INT
| INT 'b'
| '0x' INT
;
fun : FUN_TXT LPAREN exp RPAREN
| 'pow' LPAREN exp ',' exp RPAREN
;
Brackets and minuses are handled by the exp rule.
You also need to separate the ADD and SUB from the multiplicative operators to get precedence right. The calculator example for the Antlr grammars uses
expression
: multiplyingExpression ((PLUS|MINUS) multiplyingExpression)*
;
multiplyingExpression
: powExpression ((TIMES|DIV) powExpression)*
;
powExpression
: atom (POW expression)?
;
atom
: scientific
| variable
| LPAREN expression RPAREN
| func
;
scientific
: number (E number)?
;
func
: funcname LPAREN expression RPAREN
;
I would be inclined to start from that.

Precedence in Antlr using parentheses

We are developing a DSL, and we're facing some problems:
Problem 1:
In our DSL, it's allowed to do this:
A + B + C
but not this:
A + B - C
If the user needs to use two or more different operators, he'll need to insert parentheses:
A + (B - C) or (A + B) - C.
Problem 2:
In our DSL, the most precedent operator must be surrounded by parentheses.
For example, instead of using this way:
A + B * C
The user needs to use this:
A + (B * C)
To solve the Problem 1 I've got a snippet of ANTLR that worked, but I'm not sure if it's the best way to solve it:
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr;
To solve the Problem 2, I have no idea where to start.
I appreciate your help to find out a better solution to the first problem and a direction to solve the seconde one. (Sorry for my bad english)
Below is the grammar that we have developed:
grammar TclGrammar;
options {
output=AST;
ASTLabelType=CommonTree;
}
#members {
public boolean isSum(String type) {
System.out.println("Tipo: " + type);
return "+".equals(type);
}
public boolean isSub(String type) {
System.out.println("Tipo: " + type);
return "-".equals(type);
}
}
prog
: exprMain ';' {System.out.println(
$exprMain.tree == null ? "null" : $exprMain.tree.toStringTree());}
;
exprMain
: exprQuando? (exprDeveSatis | exprDeveFalharCaso)
;
exprDeveSatis
: 'DEVE SATISFAZER' '{'! expr '}'!
;
exprDeveFalharCaso
: 'DEVE FALHAR CASO' '{'! expr '}'!
;
exprQuando
: 'QUANDO' '{'! expr '}'!
;
expr
: logicExpr
;
logicExpr
: boolExpr (('E'|'OU')^ boolExpr)*
;
boolExpr
: comparatorExpr
| emExpr
| 'VERDADE'
| 'FALSO'
;
emExpr
: FIELD 'EM' '[' (variable_lista | field_lista) comCruzamentoExpr? ']'
-> ^('EM' FIELD (variable_lista+)? (field_lista+)? comCruzamentoExpr?)
;
comCruzamentoExpr
: 'COM CRUZAMENTO' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COM CRUZAMENTO' FIELD+)
;
comparatorExpr
: sumExpr (('<'^|'<='^|'>'^|'>='^|'='^|'<>'^) sumExpr)?
| naoPreenchidoExpr
| preenchidoExpr
;
naoPreenchidoExpr
: FIELD 'NAO PREENCHIDO' -> ^('NAO PREENCHIDO' FIELD)
;
preenchidoExpr
: FIELD 'PREENCHIDO' -> ^('PREENCHIDO' FIELD)
;
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr
;
multExpr
: funcExpr(('*'^|'/'^) funcExpr)?
;
funcExpr
: 'QUANTIDADE'^ '('! FIELD ')'!
| 'EXTRAI_TEXTO'^ '('! FIELD ';' INTEGER ';' INTEGER ')'!
| cruzaExpr
| 'COMBINACAO_UNICA' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COMBINACAO_UNICA' FIELD+)
| 'EXISTE'^ '('! FIELD ')'!
| 'UNICO'^ '('! FIELD ')'!
| atom
;
cruzaExpr
: operadorCruzaExpr ('CRUZA COM'^|'CRUZA AMBOS'^) operadorCruzaExpr ondeExpr?
;
operadorCruzaExpr
: FIELD('('!field_lista')'!)?
;
ondeExpr
: ('ONDE'^ '('!expr')'!)
;
atom
: FIELD
| VARIABLE
| '('! expr ')'!
;
field_lista
: FIELD(';' field_lista)?
;
variable_lista
: VARIABLE(';' variable_lista)?
;
FIELD
: NONCONTROL_CHAR+
;
NUMBER
: INTEGER | FLOAT
;
VARIABLE
: '\'' NONCONTROL_CHAR+ '\''
;
fragment SIGN: '+' | '-';
fragment NONCONTROL_CHAR: LETTER | DIGIT | SYMBOL;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SYMBOL: '_' | '.' | ',';
fragment FLOAT: INTEGER '.' '0'..'9'*;
fragment INTEGER: '0' | SIGN? '1'..'9' '0'..'9'*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;
This is similar to not having operator precedence at all.
expr
: funcExpr
( ('+' funcExpr)*
| ('-' funcExpr)*
| ('*' funcExpr)*
| ('/' funcExpr)*
)
;
I think the following should work. I'm assuming some lexer tokens with obvious names.
expr: sumExpr;
sumExpr: onlySum | subExpr;
onlySum: atom ( PLUS onlySum )?;
subExpr: onlySub | multExpr;
onlySub: atom ( MINUS onlySub )? ;
multExpr: atom ( STAR atomic )? ;
parenExpr: OPEN_PAREN expr CLOSE_PAREN;
atom: FIELD | VARIABLE | parenExpr
The only* rules match an expression if it only has one type of operator outside of parentheses. The *Expr rules match either a line with the appropriate type of operators or go to the next operator.
If you have multiple types of operators, then they are forced to be inside parentheses because the match will go through atom.

adding (...) {...} function literals while abstaining from backtracking

Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like
...
tokens {
...
FN;
ID_LIST;
}
stmt
: expr SEMI // SEMI=';'
;
callable
: ...
| fn
;
fn
: OPAREN opt_id_list CPAREN compound_stmt
-> ^(FN opt_id_list compound_stmt)
;
compound_stmt
: OBRACE stmt* CBRACE
opt_id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)
The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.
The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
A demo:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
OBRACE = '{' ;
CBRACE = '}' ;
SEMI = ';' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
FN;
ID_LIST;
STATS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
fn_literal
: OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
;
id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
compound_stmt
: OBRACE stmt* CBRACE -> ^(STATS stmt*)
;
stmt
: expr SEMI
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:
1
(a, b, c){ a+b*c; }
2
(x) * (y){ x.y; }
3
((y){})()[1].x

Solving antlr left recursion

I'm trying to parse a language using ANTLR which can contain the following syntax:
someVariable, somVariable.someMember, functionCall(param).someMember, foo.bar.baz(bjork).buffalo().xyzzy
This is the ANTLR grammar which i've come up with so far, and the access_operation throws the error
The following sets of rules are mutually left-recursive [access_operation, expression]:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
LHS;
RHS;
CALL;
PARAMS;
}
start
: body? EOF
;
body
: expression (',' expression)*
;
expression
: function -> ^(CALL)
| access_operation
| atom
;
access_operation
: (expression -> ^(LHS)) '.'! (expression -> ^(RHS))
;
function
: (IDENT '(' body? ')') -> ^(IDENT PARAMS?)
;
atom
: IDENT
| NUMBER
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
IDENT : (LETTER)+ ;
NUMBER : (DIGIT)+ ;
SPACE : (' ' | '\t' | '\r' | '\n') { $channel=HIDDEN; };
What i could manage so far was to refactor the access_operation rule to '.' expression which generates an AST where the access_operation node only contains the right side of the operation.
What i'm looking for instead is something like this:
How can the left-recursion problem solved in this case?
By "wrong AST" I'll make a semi educated guess that, for input like "foo.bar.baz", you get an AST where foo is the root with bar as a child who in its turn has baz as a child, which is a leaf in the AST. You may want to have this reversed. But I'd not go for such an AST if I were you: I'd keep the AST as flat as possible:
foo
/ | \
/ | \
bar baz ...
That way, evaluating is far easier: you simply look up foo, and then walk from left to right through its children.
A quick demo:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BODY;
ACCESS;
CALL;
PARAMS;
}
start
: body EOF -> body
;
body
: expression (',' expression)* -> ^(BODY expression+)
;
expression
: atom
;
atom
: NUMBER
| (IDENT -> IDENT) ( tail -> ^(IDENT tail)
| call tail? -> ^(CALL IDENT call tail?)
)?
;
tail
: (access)+
;
access
: ('.' IDENT -> ^(ACCESS IDENT)) (call -> ^(CALL IDENT call))?
;
call
: '(' (expression (',' expression)*)? ')' -> ^(PARAMS expression*)
;
IDENT : LETTER+;
NUMBER : DIGIT+;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
which can be tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "someVariable, somVariable.someMember, functionCall(param).someMember, " +
"foo.bar.baz(bjork).buffalo().xyzzy";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
TestParser parser = new TestParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
The output of Main corresponds to the following AST:
EDIT
And since you indicated your ultimate goal is not evaluating the input, but that you rather need to conform the structure of the AST to some 3rd party API, here's a grammar that will create an AST like you indicated in your edited question:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BODY;
ACCESS_OP;
CALL;
PARAMS;
LHS;
RHS;
}
start
: body EOF -> body
;
body
: expression (',' expression)* -> ^(BODY expression+)
;
expression
: atom
;
atom
: NUMBER
| (ID -> ID) ( ('(' params ')' -> ^(CALL ID params))
('.' expression -> ^(ACCESS_OP ^(LHS ^(CALL ID params)) ^(RHS expression)))?
| '.' expression -> ^(ACCESS_OP ^(LHS ID) ^(RHS expression))
)?
;
params
: (expression (',' expression)*)? -> ^(PARAMS expression*)
;
ID : LETTER+;
NUMBER : DIGIT+;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
which creates the following AST if you run the Main class:
The atom rule may be a bit daunting, but you can't shorten it much since the left ID needs to be available to most of the alternatives. ANTLRWorks helps in visualizing the alternative paths this rule may take:
which means atom can be any of the 5 following alternatives (with their corresponding AST's):
+----------------------+--------------------------------------------------------+
| alternative | generated AST |
+----------------------+--------------------------------------------------------+
| NUMBER | NUMBER |
| ID | ID |
| ID params | ^(CALL ID params) |
| ID params expression | ^(ACCESS_OP ^(LHS ^(CALL ID params)) ^(RHS expression))|
| ID expression | ^(ACCESS_OP ^(LHS ID) ^(RHS expression) |
+----------------------+--------------------------------------------------------+