How do I represent epsilon when I making the rules inside my Yacc file? - yacc

I am trying to implement a simple C minus parser in lex/yacc. I tested my code with a very simple set of rules and it worked. But now, when I try to add in the actual rules, I am getting this error everywhere that there is an epsilon rule. error: syntax error, unexpected identifier
I have looked up how to represent epsilon in yacc and every where I look says that I can just leave the rule empty. I have even tried putting a comment in the rule because I have seen that as well. But, I suppose the issue could lie somewhere else.
Why is this not working for me?
Below is my .y file and the first error says this (unexpected identifier):
arglist {};
^^^^^^^
void yyerror (char *s);
#include <stdio.h>
#include <stdlib.h>
int yylex();
extern int yytext[];
extern FILE *yyin;
%}
%start program
%token LTE GTE BEQUALS NOTEQUALS BEGCOMMENT ENDCOMMENT COMMENT GREATER LESS COMMA PLUS SUB MULT DIV EQUAL LP RP LB RB LC RC SEMICOLON INT FLOAT VOID IF WHILE RETURN ELSE ID NUM INVALID
%%
program : declarationlist { printf("\nACCEPT\n"); };
declarationlist : declaration declarationlistPrime {};
declaration : typespecifier ID DDD {};
DDD : vardeclarationPrime {};
| LP params RP compoundstmt {};
vardeclaration : typespecifier ID vardeclarationPrime {};
typespecifier : INT {};
| VOID {};
| FLOAT {};
params : paramlist {};
| VOID {};
paramlist : param paramlistPrime {};
param : INT ID paramPrime {};
| VOID ID paramPrime {};
| FLOAT ID paramPrime {};
compoundstmt : LC localdeclarations statementlist RC {};
localdeclarations : localdeclarationsPrime {};
statementlist : statementlistPrime {};
statement : expressionstmt {};
| LC localdeclarations statementlist RC {};
| selectionstmt {};
| iterationstmt {};
| returnstmt {};
expressionstmt : expression SEMICOLON {};
| SEMICOLON {};
selectionstmt : IF LP expression RP statement selectionstmtPrime {};
iterationstmt : WHILE LP expression RP statement {};
returnstmt : RETURN returnstmtPrime {};
expression : ID FFF {};
| LP expression RP termPrime SSS {};
| NUM termPrime SSS {};
FFF : LP args RP termPrime SSS {};
| varPrime XXX {};
XXX : EQUAL expression arglistPrime {};
| termPrime additiveexpressionPrime SSS {};
var : ID varPrime {};
SSS : additiveexpressionPrime arglistPrime {};
| relop additiveexpression arglistPrime {};
relop : LTE {};
| LESS {};
| GREATER {};
| GTE {};
| BEQUALS {};
| NOTEQUALS {};
additiveexpression : term additiveexpressionPrime {};
addop : ADD {};
| SUB {};
term : factor termPrime {};
mulop : MULT {};
| DIV {};
factor : LP expression RP {};
| ID factorXYZ {};
| NUM {};
factorXYZ : varPrime {};
| LP args RP {};
args : /* epsilon */ | {};
arglist {};
arglist : ID CS {};
| LP expression RP termPrime FID {};
| NUM termPrime FID {};
CS : varPrime EEE {};
| LP args RP termPrime FID {};
EEE : EQUAL expression arglistPrime {};
| termPrime FID {};
FID : relop additiveexpression arglistPrime {};
| additiveexpressionPrime arglistPrime {};
vardeclarationPrime : SEMICOLON {};
| LB NUM RB SEMICOLON {};
paramPrime : /* epsilon */ | {};
LB RB {};
selectionstmtPrime : |
ELSE statement {};
returnstmtPrime : SEMICOLON {};
| ID CCC {};
| NUM termPrime BBB {};
| LP expression RP termPrime BBB {};
AAA : EQUAL expression SEMICOLON {};
| termPrime BBB {};
BBB : relop additiveexpression SEMICOLON {};
| additiveexpressionPrime SEMICOLON {};
CCC : varPrime AAA {};
| LP args RP termPrime BBB {};
varPrime : | {};
LB expression RB {};
declarationlistPrime : | {};
declaration declarationlistPrime {};
paramlistPrime : | {};
COMMA param paramlistPrime {};
localdeclarationsPrime : | {};
vardeclaration localdeclarationsPrime {};
statementlistPrime : | {};
statement statementlistPrime {};
additiveexpressionPrime : | {};
addop term additiveexpressionPrime {};
termPrime : | {};
mulop factor termPrime {};
arglistPrime : | {};
COMMA expression arglistPrime {};
%%
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
if (!yyin)
{
printf("no file\n");
exit(0);
}
yyparse();
}
void yyerror(char *s)
{
printf("\nREJECT\n");
// printf("error from yyerror\n");
exit(0);
}
int yywrap()
{
// printf("in yywarp\n");
exit(0);
}```
Please help, I have tried everything I can think of.

You have
args : /* epsilon */ | {};
Which is two productions for args, both of them empty, and one with an explicit no-op action. That's terminated with a semi-colon, meaning the end of the rules for args. Thus, bison is not expecting another right-hand side. It's expecting rules for some different non-terminal.
What you meant, I suppose, was
args : /* epsilon */
| arglist
;
Note that there is no need to explicitly add an empty action Leaving the action out altogether (as above) is exactly the same, and is arguably less noisy.
Better style with bison is to use the %empty marker instead of a comment, because bison will ensure that a rule with %empty really is empty:
args : %empty
| arglist
;

Related

Using a grammar with a visitor to calculate arithmetic expressions

We've been given a grammar in class that looks like this:
grammar Calculator;
#header {
import java.util.*;
}
#parser::members {
/** "memory" for our calculator; variable/value pairs go here */
Map<String, Double> memory = new HashMap<String, Double>();
}
statlist : stat+ ;
stat : vgl NL #printCompare
| ass NL #printAssign
| NL #blank
;
ass : <assoc=right> VAR ('=') vgl #assign
;
vgl : sum(op=('<'|'>') sum)* #compare
;
sum : prod(op=('+'|'-') prod)* #addSub
;
prod : pot(op=('*'|'/') pot)* #mulDiv
;
pot :<assoc=right> term(op='^' pot)? #poten
;
term : '+' term #add
| '-' term #subtract
| '(' sum ')' #parens
| VAR #var
| INT #int
;
/*Rules for the lexer */
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
BIG : '>' ;
SML : '<' ;
POT : '^' ;
VAR : [a-zA-Z]+ ;
NL : [\n] ;
INT : [0-9]+ ;
WS : [ \r\t]+ -> skip ; // skip spaces, tabs
I am having problems translating constructs like these
sum : prod(op=('+'|'-') prod)* #addSub
into working code. Currently the corresponding method looks like this:
/** prod(op=('+'|'-') prod)* */
#Override
public Double visitAddSub(CalculatorParser.AddSubContext ctx) {
double left = visit(ctx.prod(0));
if(ctx.op == null){
return left;
}
double right = visit(ctx.prod(1));
return (ctx.op.getType() == CalculatorParser.ADD) ? left+right : left-right;
}
Current output would look like this
3+3+3
6.0
which is obviously false. How do I get my visitor to visit the nodes correctly without touching the grammar?
Take a look at the rule:
prod(op=('+'|'-') prod)*
See that *? It means that what's inside the parentheses can come up 0 or more times.
Your visitor code assumes there will either be only one or two child prod, but no more. That's why you see 6.0: the parser put 3+3+3 into the context, but your visitor only processed 3+3 and leaved the final +3 out.
So just use a while loop over all the op and prod children, and accumulate them into the result.
Okay, with the help of Lucas and the usage of op+= I manage to fix my problem. It looks pretty complicated but it works.
/** prod(op+=('+'|'-') prod)* */
#Override
public Double visitAddSub(CalculatorParser.AddSubContext ctx) {
Stack<Double> temp = new Stack<Double>();
switch(ctx.children.size()){
case 1: return visit(ctx.prod(0));
default:
Double ret = 0.0;
for(int i = 0; i < ctx.op.size(); i++){
if(ctx.op.get(i).getType()==CalculatorParser.ADD){
if(temp.isEmpty()) {
ret = visit(ctx.prod(i)) + visit(ctx.prod(i+1));
temp.push(ret);
} else {
ret = temp.pop() + visit(ctx.prod(i+1));
temp.push(ret);
}
} else {
if(temp.isEmpty()) {
ret = visit(ctx.prod(i)) - visit(ctx.prod(i+1));
temp.push(ret);
} else {
ret = temp.pop() - visit(ctx.prod(i+1));
temp.push(ret);
}
}
}
}
return temp.pop();
}
We are using a switch-case to determine how many children this context has. If its more than 3 we have atleast 2 operators. We're then using the individual operator and a stack to determine the result.

ANTLR v3 Treewalker class. How to evaluate right associative function such as Factorial

I'm trying to build an expression evaluator with ANTLR v3 but I can't get the factorial function because it is right associative.
This is the code:
class ExpressionParser extends Parser;
options { buildAST=true; }
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS;
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : runary (POW^ runary)? ;
runary : unary (FAT)?;
unary : (SIN^|COS^|TAN^|LOG^|LN^|RAD^)* signExpr;
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
atom : NUMBER | expr ;
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SIN : 's' ;
COS : 'c' ;
TAN : 't' ;
LOG : 'l' ;
LN : 'n' ;
RAD : 'r' ;
FAT : 'f' ;
LPAREN: '(' ;
RPAREN: ')' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
NUMBER : (DIGIT)+ ('.' (DIGIT)+)?;
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; int i,f=1; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| #(SIN a=expr ) { r=Math.sin(a); }
| #(COS a=expr ) { r=Math.cos(a); }
| #(TAN a=expr ) { r=Math.tan(a); }
| #(LOG a=expr ) { r=Math.log10(a); }
| #(LN a=expr ) { r=Math.log(a); }
| #(RAD a=expr ) { r=Math.sqrt(a); }
| #(FAT a=expr ) { for(i=1; i<=a; i++){f=f*i;}; r=(double)f;}
| #(LPAREN a=expr) { r=a; }
| #(SIGN_MINUS a=expr) { r=-1*a; }
| #(SIGN_PLUS a=expr) { if(a<0)r=0-a; else r=a; }
| d:NUMBER { r=Double.parseDouble(d.getText()); } ;
if I change FAT matching case in class TreeWalker with something like this:
| #(a=expr FAT ) { for(i=1; i<=a; i++){f=f*i;}; r=(double)f;}
I get this errors:
Expression.g:56:7: rule classDef trapped:
Expression.g:56:7: unexpected token: a
error: aborting grammar 'ExpressionTreeWalker' due to errors
Exiting due to errors.
Your tree walker (the original one) is fine, as far as I can see.
However, you probably need to mark FAT in the grammar:
runary : unary (FAT^)?;
(Note the hat ^, as in all the other productions.)
Edit:
As explained in the Antlr3 wiki, the hat operator is needed to make the node the "root of subtree created for entire enclosing rule even if nested in a subrule". In this case, the ! operator is nested in a conditional subrule ((FAT)?). That's independent of whether the operator is prefix or postfix.
Note that in your grammar the ! operator is not right-associative since a!! is not valid at all. But I would say that associativity is only meaningful for infix operators.

Using Tree Walker with Boolean checks + capturing the whole expression

I have actually two questions that I hope can be answered as they are semi-dependent on my work. Below is the grammar + tree grammar + Java test file.
What I am actually trying to achieve is the following:
Question 1:
I have a grammar that parses my language correctly. I would like to do some semantic checks on variable declarations. So I created a tree walker and so far it semi works. My problem is it's not capturing the whole string of expression. For example,
float x = 10 + 10;
It is only capturing the first part, i.e. 10. I am not sure what I am doing wrong. If I did it in one pass, it works. Somehow, if I split the work into a grammar and tree grammar, it is not capturing the whole string.
Question 2:
I would like to do a check on a rule such that if my conditions returns true, I would like to remove that subtree. For example,
float x = 10;
float x; // <================ I would like this to be removed.
I have tried using rewrite rules but I think it is more complex than that.
Test.g:
grammar Test;
options {
language = Java;
output = AST;
}
parse : varDeclare+
;
varDeclare : type id equalExp? ';'
;
equalExp : ('=' (expression | '...'))
;
expression : binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
TestTree.g:
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
parse[SemanticCheck s]
: varDeclare+
;
varDeclare : type id equalExp? ';'
{s.check($type.name, $id.text, $equalExp.expr);}
;
equalExp returns [String expr]
: ('=' (expression {$expr = $expression.e;} | '...' {$expr = "...";}))
;
expression returns [String e]
#after {$e = $expression.text;}
: binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type returns [String name]
#after { $name = $type.text; }
: 'int'
| 'float'
;
Java test file, Test.java:
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RuleReturnScope;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10+y; \n" +
"float x; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
//TestLexer lexer = new TestLexer(new ANTLRFileStream("input.txt"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
RuleReturnScope r = parser.parse();
System.out.println("Parse Tree:\n" + tokenStream.toString());
CommonTree t = (CommonTree)r.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
walker.parse(s);
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
System.out.println("Remove statement! Already defined!");
return true;
}
names.add(variableName);
return false;
}
}
Thanks in advance!
I figured out my problem and it turns out I needed to build an AST first before I can do anything. This would help in understanding what is a flat tree look like vs building an AST.
How to output the AST built using ANTLR?
Thanks to Bart's endless examples here in StackOverFlow, I was able to do semantic predicates to do what I needed in the example above.
Below is the updated code:
Test.g
grammar Test;
options {
language = Java;
output = AST;
}
tokens {
VARDECL;
Assign = '=';
EqT = '==';
NEq = '!=';
LT = '<';
LTEq = '<=';
GT = '>';
GTEq = '>=';
NOT = '!';
PLUS = '+';
MINUS = '-';
MULT = '*';
DIV = '/';
}
parse : varDeclare+
;
varDeclare : type id equalExp ';' -> ^(VARDECL type id equalExp)
;
equalExp : (Assign^ (expression | '...' ))
;
expression : binaryExpression
;
binaryExpression : addingExpression ((EqT|NEq|LTEq|GTEq|LT|GT)^ addingExpression)*
;
addingExpression : multiplyingExpression ((PLUS|MINUS)^ multiplyingExpression)*
;
multiplyingExpression : unaryExpression
((MULT|DIV)^ unaryExpression)*
;
unaryExpression: ((NOT|MINUS))^ primitiveElement
| primitiveElement
;
primitiveElement : literalExpression
| id
| '(' expression ')' -> expression
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
This should automatically build an AST whenever you have varDeclare. Now on to the tree grammar/walker.
TestTree.g
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
output = AST;
}
tokens {
REMOVED;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
start[SemanticCheck s] : varDeclare+
;
varDeclare : ^(VARDECL type id equalExp)
-> {s.check($type.text, $id.text, $equalExp.text)}? REMOVED
-> ^(VARDECL type id equalExp)
;
equalExp : ^(Assign expression)
| ^(Assign '...')
;
expression : ^(('!') expression)
| ^(('+'|'-'|'*'|'/') expression expression*)
| ^(('=='|'<='|'<'|'>='|'>'|'!=') expression expression*)
| literalExpression
;
literalExpression : INT
| id
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
Now on to test it:
Test.java
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10; \n" +
"int x = 1; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
TestParser.parse_return r = parser.parse();
System.out.println("Tree:" + ((Tree)r.tree).toStringTree() + "\n");
CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
TestTree.start_return r2 = walker.start(s);
System.out.println("\nTree Walker: "+((Tree)r2.tree).toStringTree());
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
return true;
}
names.add(variableName);
return false;
}
}
Output:
Tree:(VARDECL float x (= 10)) (VARDECL int x (= 1))
Type: float variableName: x exp: = 10
Type: int variableName: x exp: = 1
Tree Walker: (VARDECL float x (= 10)) REMOVED
Hope this helps! Please feel free to point any errors if I did something wrong.

Generating simple AST in ANTLR

I'm playing a bit around with ANTLR, and wish to create a function like this:
MOVE x y z pitch roll
That produces the following AST:
MOVE
|---x
|---y
|---z
|---pitch
|---roll
So far I've tried without luck, and I keep getting the AST to have the parameters as siblings, rather than children.
Code so far:
C#:
class Program
{
const string CRLF = "\r\n";
static void Main(string[] args)
{
string filename = "Script.txt";
var reader = new StreamReader(filename);
var input = new ANTLRReaderStream(reader);
var lexer = new ScorBotScriptLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new ScorBotScriptParser(tokens);
var result = parser.program();
var tree = result.Tree as CommonTree;
Print(tree, "");
Console.Read();
}
static void Print(CommonTree tree, string indent)
{
Console.WriteLine(indent + tree.ToString());
if (tree.Children != null)
{
indent += "\t";
foreach (var child in tree.Children)
{
var childTree = child as CommonTree;
if (childTree.Text != CRLF)
{
Print(childTree, indent);
}
}
}
}
ANTLR:
grammar ScorBotScript;
options
{
language = 'CSharp2';
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
memoize = true;
}
#parser::namespace { RSD.Scripting }
#lexer::namespace { RSD.Scripting }
program
: (robotInstruction CRLF)*
;
robotInstruction
: moveCoordinatesInstruction
;
/**
* MOVE X Y Z PITCH ROLL
*/
moveCoordinatesInstruction
: 'MOVE' x=INT y=INT z=INT pitch=INT roll=INT
;
INT : '-'? ( '0'..'9' )*
;
COMMENT
: '//' ~( CR | LF )* CR? LF { $channel = HIDDEN; }
;
WS
: ( ' ' | TAB | CR | LF ) { $channel = HIDDEN; }
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
fragment TAB
: '\t'
;
fragment CR
: '\r'
;
fragment LF
: '\n'
;
CRLF
: (CR ? LF) => CR ? LF
| CR
;
parse
: ID
| INT
| COMMENT
| STRING
| WS
;
I'm a beginner with ANTLR myself, this confused me too.
I think if you want to create a tree from your grammar that has structure, you augment your grammar with hints using the ^ and ! characters. This examples page shows how.
From the linked page:
By default ANTLR creates trees as
"sibling lists".
The grammar must be annotated to with
tree commands to produce a parser that
creates trees in the correct shape
(that is, operators at the root, which
operands as children). A somewhat more
complicated expression parser can be
seen here and downloaded in tar form
here. Note that grammar terminals
which should be at the root of a
sub-tree are annotated with ^.

ANTLR: multiplication omiting '*' symbol

I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this:
1 2 / 3 4
I want the AST to be
(* (/ (* 1 2) 3) 4)
I've hit upon the following, which uses java code to create the appropriate nodes:
grammar TestProd;
options {
output = AST;
}
tokens {
PROD;
}
DIV : '/';
multExpr: (INTEGER -> INTEGER)
( {div = null;}
div=DIV? b=INTEGER
->
^({$div == null ? (Object)adaptor.create(PROD, "*") : (Object)adaptor.create(DIV, "/")}
$multExpr $b))*
;
INTEGER: ('0' | '1'..'9' '0'..'9'*);
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
This works. But is there a better/simpler way?
Here's a way:
grammar Test;
options {
backtrack=true;
output=AST;
}
tokens {
MUL;
DIV;
}
parse
: expr* EOF
;
expr
: (atom -> atom)
( '/' a=atom -> ^(DIV $expr $a)
| a=atom -> ^(MUL $expr $a)
)*
;
atom
: Number
| '(' expr ')' -> expr
;
Number
: '0'..'9'+
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String[] args) throws Exception {
String source = "1 2 / 3 4";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.parse_return result = parser.parse();
Tree tree = (Tree)result.getTree();
System.out.println(tree.toStringTree());
}
}
produced:
(MUL (DIV (MUL 1 2) 3) 4)