Antlr setText not working in the way I expected - antlr

I have a requirement to convert an identifier into a beanutil string for retrieving an item from an object. The the identifiers to string conversions look like:
name ==> name
attribute.name ==> attributes(name)[0].value
attribute.name[2] ==> attributes(name)[2].value
address.attribute.postalcode ==> contactDetails.addresses[0].attributes(postalcode)[0].value
address[2].attribute.postalcode ==> contactDetails.addresses[2].attributes(postalcode)[0].value
address[2].attribute.postalcode[3] ==> contactDetails.addresses[2].attributes(postalcode)[3].value
Now I have decided to do this using antlr as I feel its probably going to be just as quick as using a set of 'if' statements. Feel free to tell me I'm wrong.
Right now, I've got this partial working using antlr, however once I start doing the 'address' ones, the setText part seems to stop working for Attribute.
Am I doing this the correct way or is there a better way of using antlr to get the result I want?
grammar AttributeParser;
parse returns [ String result ]
: Address EOF { $result = $Address.text; }
| Attribute EOF { $result = $Attribute.text; }
| Varname EOF { $result = $Varname.text; }
;
Address
: 'address' (Arraypos)* '.' Attribute { setText("contactDetails.addresses" + ($Arraypos == null ? "[0]" : $Arraypos.text ) + "." + $Attribute.text); }
;
Attribute
: 'attribute.' Varname (Arraypos)* { setText("attributes(" + $Varname.text + ")" + ($Arraypos == null ? "[0]" : $Arraypos.text ) + ".value"); }
;
Arraypos
: '[' Number+ ']'
;
Varname
: ('a'..'z'|'A'..'Z')+
;
Number
: '0'..'9'+
;
Spaces
: (' ' | '\t' | '\r' | '\n')+ { setText(" "); }
;
Below are two unit tests, the first returns what I expect, the second doesn't.
#Test
public void testSimpleAttributeWithArrayRef() throws Exception {
String source = "attribute.name[2]";
ANTLRStringStream in = new ANTLRStringStream(source);
AttributeParserLexer lexer = new AttributeParserLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AttributeParserParser parser = new AttributeParserParser(tokens);
String result = parser.parse();
assertEquals("attributes(name)[2].value", result);
}
#Test
public void testAddress() throws Exception {
String source = "address.attribute.postalcode";
ANTLRStringStream in = new ANTLRStringStream(source);
AttributeParserLexer lexer = new AttributeParserLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AttributeParserParser parser = new AttributeParserParser(tokens);
String result = parser.parse();
System.out.println("Result: " + result);
assertEquals("contactDetails.addresses[0].attributes(postalcode)[0].value", result);
}

No, you can't do (Arraypos)* and then refer to the contents as this: $Arraypos.text.
I wouldn't go changing the inner text of the tokens, but create a couple of parser rules and let them return the appropriate text.
A little demo:
grammar AttributeParser;
parse returns [String s]
: input EOF {$s = $input.s;}
;
input returns [String s]
: address {$s = $address.s;}
| attribute {$s = $attribute.s;}
| Varname {$s = $Varname.text;}
;
address returns [String s]
: Address arrayPos '.' attribute
{$s = "contactDetails.addresses" + $arrayPos.s + "." + $attribute.s;}
;
attribute returns [String s]
: Attribute '.' Varname arrayPos
{$s = "attributes(" + $Varname.text + ")" + $arrayPos.s + ".value" ;}
;
arrayPos returns [String s]
: Arraypos {$s = $Arraypos.text;}
| /* nothing */ {$s = "[0]";}
;
Attribute : 'attribute';
Address : 'address';
Arraypos : '[' '0'..'9'+ ']';
Varname : ('a'..'z' | 'A'..'Z')+;
which can be tested with:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String[][] tests = {
{"name", "name"},
{"attribute.name", "attributes(name)[0].value"},
{"attribute.name[2]", "attributes(name)[2].value"},
{"address.attribute.postalcode", "contactDetails.addresses[0].attributes(postalcode)[0].value"},
{"address[2].attribute.postalcode", "contactDetails.addresses[2].attributes(postalcode)[0].value"},
{"address[2].attribute.postalcode[3]", "contactDetails.addresses[2].attributes(postalcode)[3].value"}
};
for(String[] test : tests) {
String input = test[0];
String expected = test[1];
AttributeParserLexer lexer = new AttributeParserLexer(new ANTLRStringStream(input));
AttributeParserParser parser = new AttributeParserParser(new CommonTokenStream(lexer));
String output = parser.parse();
if(!output.equals(expected)) {
throw new RuntimeException(output + " != " + expected);
}
System.out.printf("in = %s\nout = %s\n\n", input, output, expected);
}
}
}
And to run the demo do:
java -cp antlr-3.3.jar org.antlr.Tool AttributeParser.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
which will print the following to the console:
in = name
out = name
in = attribute.name
out = attributes(name)[0].value
in = attribute.name[2]
out = attributes(name)[2].value
in = address.attribute.postalcode
out = contactDetails.addresses[0].attributes(postalcode)[0].value
in = address[2].attribute.postalcode
out = contactDetails.addresses[2].attributes(postalcode)[0].value
in = address[2].attribute.postalcode[3]
out = contactDetails.addresses[2].attributes(postalcode)[3].value
EDIT
Note that you can also let parser rules return more than just one object like this:
bar
: foo {System.out.println($foo.text + ", " + $foo.number);}
;
foo returns [String text, int number]
: 'FOO' {$text = "a"; $number = 1;}
| 'foo' {$text = "b"; $number = 2;}
;

Related

Using Tree Walker with Boolean checks + capturing the whole expression

I have actually two questions that I hope can be answered as they are semi-dependent on my work. Below is the grammar + tree grammar + Java test file.
What I am actually trying to achieve is the following:
Question 1:
I have a grammar that parses my language correctly. I would like to do some semantic checks on variable declarations. So I created a tree walker and so far it semi works. My problem is it's not capturing the whole string of expression. For example,
float x = 10 + 10;
It is only capturing the first part, i.e. 10. I am not sure what I am doing wrong. If I did it in one pass, it works. Somehow, if I split the work into a grammar and tree grammar, it is not capturing the whole string.
Question 2:
I would like to do a check on a rule such that if my conditions returns true, I would like to remove that subtree. For example,
float x = 10;
float x; // <================ I would like this to be removed.
I have tried using rewrite rules but I think it is more complex than that.
Test.g:
grammar Test;
options {
language = Java;
output = AST;
}
parse : varDeclare+
;
varDeclare : type id equalExp? ';'
;
equalExp : ('=' (expression | '...'))
;
expression : binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
TestTree.g:
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
parse[SemanticCheck s]
: varDeclare+
;
varDeclare : type id equalExp? ';'
{s.check($type.name, $id.text, $equalExp.expr);}
;
equalExp returns [String expr]
: ('=' (expression {$expr = $expression.e;} | '...' {$expr = "...";}))
;
expression returns [String e]
#after {$e = $expression.text;}
: binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type returns [String name]
#after { $name = $type.text; }
: 'int'
| 'float'
;
Java test file, Test.java:
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RuleReturnScope;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10+y; \n" +
"float x; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
//TestLexer lexer = new TestLexer(new ANTLRFileStream("input.txt"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
RuleReturnScope r = parser.parse();
System.out.println("Parse Tree:\n" + tokenStream.toString());
CommonTree t = (CommonTree)r.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
walker.parse(s);
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
System.out.println("Remove statement! Already defined!");
return true;
}
names.add(variableName);
return false;
}
}
Thanks in advance!
I figured out my problem and it turns out I needed to build an AST first before I can do anything. This would help in understanding what is a flat tree look like vs building an AST.
How to output the AST built using ANTLR?
Thanks to Bart's endless examples here in StackOverFlow, I was able to do semantic predicates to do what I needed in the example above.
Below is the updated code:
Test.g
grammar Test;
options {
language = Java;
output = AST;
}
tokens {
VARDECL;
Assign = '=';
EqT = '==';
NEq = '!=';
LT = '<';
LTEq = '<=';
GT = '>';
GTEq = '>=';
NOT = '!';
PLUS = '+';
MINUS = '-';
MULT = '*';
DIV = '/';
}
parse : varDeclare+
;
varDeclare : type id equalExp ';' -> ^(VARDECL type id equalExp)
;
equalExp : (Assign^ (expression | '...' ))
;
expression : binaryExpression
;
binaryExpression : addingExpression ((EqT|NEq|LTEq|GTEq|LT|GT)^ addingExpression)*
;
addingExpression : multiplyingExpression ((PLUS|MINUS)^ multiplyingExpression)*
;
multiplyingExpression : unaryExpression
((MULT|DIV)^ unaryExpression)*
;
unaryExpression: ((NOT|MINUS))^ primitiveElement
| primitiveElement
;
primitiveElement : literalExpression
| id
| '(' expression ')' -> expression
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
This should automatically build an AST whenever you have varDeclare. Now on to the tree grammar/walker.
TestTree.g
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
output = AST;
}
tokens {
REMOVED;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
start[SemanticCheck s] : varDeclare+
;
varDeclare : ^(VARDECL type id equalExp)
-> {s.check($type.text, $id.text, $equalExp.text)}? REMOVED
-> ^(VARDECL type id equalExp)
;
equalExp : ^(Assign expression)
| ^(Assign '...')
;
expression : ^(('!') expression)
| ^(('+'|'-'|'*'|'/') expression expression*)
| ^(('=='|'<='|'<'|'>='|'>'|'!=') expression expression*)
| literalExpression
;
literalExpression : INT
| id
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
Now on to test it:
Test.java
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10; \n" +
"int x = 1; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
TestParser.parse_return r = parser.parse();
System.out.println("Tree:" + ((Tree)r.tree).toStringTree() + "\n");
CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
TestTree.start_return r2 = walker.start(s);
System.out.println("\nTree Walker: "+((Tree)r2.tree).toStringTree());
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
return true;
}
names.add(variableName);
return false;
}
}
Output:
Tree:(VARDECL float x (= 10)) (VARDECL int x (= 1))
Type: float variableName: x exp: = 10
Type: int variableName: x exp: = 1
Tree Walker: (VARDECL float x (= 10)) REMOVED
Hope this helps! Please feel free to point any errors if I did something wrong.

ANTLR how to evaluate ifdef else end statement

I would like to know how to evaluate ifdef ... else ... end statement which could be put anywhere in a code. I start with a simple example who implement 2 basic function add(p1,p2) and diff(p1,p2) where p1 and p2 are Strings. It only adds a + or a - in between p1 and p2. Here is my grammar:
grammar ifdef;
options {
language = Java;
output = AST;
ASTLabelType=CommonTree;
}
tokens
{
EQUAL = '=' ;
HASH = '#' ;
DBLEQUOTE = '"' ;
SEMICOLON = ';' ;
}
#header {
package Grammar;
import java.util.Map;
import java.util.HashMap;
}
#lexer::header {
package Grammar;
}
#members {
private Map<String, String> strMapID = new HashMap<String, String>();
private Map<String, String> strMapDefine = new HashMap<String, String>();
}
rule returns [String strEval]
: { StringBuilder strBuilder = new StringBuilder(); }
( command
{ if ( $command.str != "" ) {
strBuilder.append( $command.str );
strBuilder.append( "\n" );
}
}
)+ EOF
{ $strEval = strBuilder.toString(); }
;
command returns [String str]
: define { $str=""; }
| undef { $str=""; }
| set { $str=""; }
| function { $str = $function.str; }
| ifdef { $str=""; }
;
define
: HASH 'define' ID
{ strMapDefine.put($ID.text, $ID.text); } // save define ID into hash table
;
undef
: HASH 'undef' ID
{ if ( strMapDefine.containsKey($ID.text) ) {
strMapDefine.remove($ID.text); // undef ID in hash table
}
}
;
set
: 'set' ID EQUAL string SEMICOLON
{ strMapID.put($ID.text, $string.text); } // save ID,string definition into hash table
;
string
: DBLEQUOTE expr DBLEQUOTE
;
function returns [String str]
: add { $str = $add.str; }
| diff { $str = $diff.str; }
;
add returns [String str]
: 'add' '(' p1=param ',' p2=param ')'
{ StringBuilder strBuilder = new StringBuilder();
strBuilder.append( $p1.str );
strBuilder.append( "+" );
strBuilder.append( $p2.str );
$str = strBuilder.toString();
}
;
diff returns [String str]
: 'diff' '(' p1=param ',' p2=param ')'
{ StringBuilder strBuilder = new StringBuilder();
strBuilder.append( $p1.str );
strBuilder.append( "-" );
strBuilder.append( $p2.str );
$str = strBuilder.toString();
}
;
param returns [String str]
: ID { $str = strMapID.get($ID.text); } // assign ID definition to str
| string { $str = $string.text; }
| function { $str = $function.str; }
;
ifdef
: { Boolean bElse= false;
StringBuilder strBuilder = new StringBuilder(); }
HASH 'ifdef' '(' ID ')' c1=expr ( HASH 'else' c2=expr { bElse= true; } )? HASH 'end'
{ if ( strMapDefine.containsKey($ID.text) ) {
strBuilder.append($c1.text);
}
else {
if ( bElse ) {
strBuilder.append($c2.text);
}
}
System.out.println("ifdef content is : " + strBuilder.toString() + "\n" );
}
;
expr
: .+
;
ID
: ('a'..'z' | 'A'..'Z')+
;
WS
: ( ' ' | '\t' | '\n' | '\r' ) {$channel=HIDDEN;}
;
and java main class
package Grammar;
import java.io.IOException;
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.TokenStream;
import Grammar.ifdefParser.rule_return;
public class mainifdef {
public static void main(String[] args) throws RecognitionException {
CharStream stream=null;
try {
stream = new ANTLRFileStream("src/input/test.txt");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ifdefLexer lexer = new ifdefLexer(stream);
TokenStream tokenStream = new CommonTokenStream(lexer);
ifdefParser parser = new ifdefParser(tokenStream);
rule_return evaluator = parser.rule();
System.out.println("Parsing Tree is \n" + evaluator.tree.toStringTree() + "\n");
System.out.println("Evaluation is \n" + evaluator.strEval + "\n");
}
}
With input file content like:
#undef Z
set AB = "toto";
set CD = "titi";
diff(CD,"essai")
#ifdef(Z) add( #end
add ( AB , diff( CD,"essai") )
The result console is as expected
ifdef content is :
Parsing Tree is
# undef Z set AB = " toto " ; set CD = " titi " ; diff ( CD , " essai " ) # ifdef ( Z ) add ( # end add ( AB , diff ( CD , " essai " ) )
Evaluation is
"titi"-"essai"
"toto"+"titi"-"essai"
My question is how to do to evaluate such input text which should give same result :
#undef Z
set AB = "toto";
set CD = "titi";
diff(CD,"essai")
#ifdef(Z) add( #end
add ( AB , #ifdef(Z) add( #else diff( #end CD,"essai") )
The result console is (which is normal ANTLR behavior):
ifdef content is :
src/input/test.txt line 8:11 no viable alternative at input '#'
src/input/test.txt line 8:43 missing EOF at 'CD'
ifdef content is : diff(
Parsing Tree is
# undef Z set AB = " toto " ; set CD = " titi " ; diff ( CD , " essai " ) # ifdef ( Z ) add ( # end add ( AB , < unexpected: [#59,94:94='#',< 6 >,8:11], resync=# > < missing ')'> # ifdef ( Z ) add ( # else diff ( # end < missing EOF >
Evaluation is
"titi"-"essai"
"toto"+null
Some guidance where to start will be much more appreciate
Regards JPM

ANTLRWorks :Can't get operators to work

I've been trying to learn ANTLR for some time and finally got my hands on The Definitive ANTLR reference.
Well I tried the following in ANTLRWorks 1.4
grammar Test;
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
expression
: INT ('+'^ INT)*;
When I pass 2+4 and process expression, I don't get a tree with + as the root and 2 and 4 as the child nodes. Rather, I get expression as the root and 2, + and 4 as child nodes at the same level.
Can't figure out what I am doing wrong. Need help desparately.
BTW how can I get those graphic descriptions ?
Yes, you get the expression because it's an expression that your only rule expression is returning.
I have just added a virtual token PLUS to your example along with a rewrite expression that show the result your are expecting.
But it seems that you have already found the solution :o)
grammar Test;
options {
output=AST;
ASTLabelType = CommonTree;
}
tokens {PLUS;}
#members {
public static void main(String [] args) {
try {
TestLexer lexer =
new TestLexer(new ANTLRStringStream("2+2"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.expression_return p_result = parser.expression();
CommonTree ast = p_result.tree;
if( ast == null ) {
System.out.println("resultant tree: is NULL");
} else {
System.out.println("resultant tree: " + ast.toStringTree());
}
} catch(Exception e) {
e.printStackTrace();
}
}
}
expression
: INT ('+' INT)* -> ^(PLUS INT+);
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;

Generating simple AST in ANTLR

I'm playing a bit around with ANTLR, and wish to create a function like this:
MOVE x y z pitch roll
That produces the following AST:
MOVE
|---x
|---y
|---z
|---pitch
|---roll
So far I've tried without luck, and I keep getting the AST to have the parameters as siblings, rather than children.
Code so far:
C#:
class Program
{
const string CRLF = "\r\n";
static void Main(string[] args)
{
string filename = "Script.txt";
var reader = new StreamReader(filename);
var input = new ANTLRReaderStream(reader);
var lexer = new ScorBotScriptLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new ScorBotScriptParser(tokens);
var result = parser.program();
var tree = result.Tree as CommonTree;
Print(tree, "");
Console.Read();
}
static void Print(CommonTree tree, string indent)
{
Console.WriteLine(indent + tree.ToString());
if (tree.Children != null)
{
indent += "\t";
foreach (var child in tree.Children)
{
var childTree = child as CommonTree;
if (childTree.Text != CRLF)
{
Print(childTree, indent);
}
}
}
}
ANTLR:
grammar ScorBotScript;
options
{
language = 'CSharp2';
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
memoize = true;
}
#parser::namespace { RSD.Scripting }
#lexer::namespace { RSD.Scripting }
program
: (robotInstruction CRLF)*
;
robotInstruction
: moveCoordinatesInstruction
;
/**
* MOVE X Y Z PITCH ROLL
*/
moveCoordinatesInstruction
: 'MOVE' x=INT y=INT z=INT pitch=INT roll=INT
;
INT : '-'? ( '0'..'9' )*
;
COMMENT
: '//' ~( CR | LF )* CR? LF { $channel = HIDDEN; }
;
WS
: ( ' ' | TAB | CR | LF ) { $channel = HIDDEN; }
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
fragment TAB
: '\t'
;
fragment CR
: '\r'
;
fragment LF
: '\n'
;
CRLF
: (CR ? LF) => CR ? LF
| CR
;
parse
: ID
| INT
| COMMENT
| STRING
| WS
;
I'm a beginner with ANTLR myself, this confused me too.
I think if you want to create a tree from your grammar that has structure, you augment your grammar with hints using the ^ and ! characters. This examples page shows how.
From the linked page:
By default ANTLR creates trees as
"sibling lists".
The grammar must be annotated to with
tree commands to produce a parser that
creates trees in the correct shape
(that is, operators at the root, which
operands as children). A somewhat more
complicated expression parser can be
seen here and downloaded in tar form
here. Note that grammar terminals
which should be at the root of a
sub-tree are annotated with ^.

ANTLR: multiplication omiting '*' symbol

I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this:
1 2 / 3 4
I want the AST to be
(* (/ (* 1 2) 3) 4)
I've hit upon the following, which uses java code to create the appropriate nodes:
grammar TestProd;
options {
output = AST;
}
tokens {
PROD;
}
DIV : '/';
multExpr: (INTEGER -> INTEGER)
( {div = null;}
div=DIV? b=INTEGER
->
^({$div == null ? (Object)adaptor.create(PROD, "*") : (Object)adaptor.create(DIV, "/")}
$multExpr $b))*
;
INTEGER: ('0' | '1'..'9' '0'..'9'*);
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
This works. But is there a better/simpler way?
Here's a way:
grammar Test;
options {
backtrack=true;
output=AST;
}
tokens {
MUL;
DIV;
}
parse
: expr* EOF
;
expr
: (atom -> atom)
( '/' a=atom -> ^(DIV $expr $a)
| a=atom -> ^(MUL $expr $a)
)*
;
atom
: Number
| '(' expr ')' -> expr
;
Number
: '0'..'9'+
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String[] args) throws Exception {
String source = "1 2 / 3 4";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.parse_return result = parser.parse();
Tree tree = (Tree)result.getTree();
System.out.println(tree.toStringTree());
}
}
produced:
(MUL (DIV (MUL 1 2) 3) 4)