I have a requirement to convert an identifier into a beanutil string for retrieving an item from an object. The the identifiers to string conversions look like:
name ==> name
attribute.name ==> attributes(name)[0].value
attribute.name[2] ==> attributes(name)[2].value
address.attribute.postalcode ==> contactDetails.addresses[0].attributes(postalcode)[0].value
address[2].attribute.postalcode ==> contactDetails.addresses[2].attributes(postalcode)[0].value
address[2].attribute.postalcode[3] ==> contactDetails.addresses[2].attributes(postalcode)[3].value
Now I have decided to do this using antlr as I feel its probably going to be just as quick as using a set of 'if' statements. Feel free to tell me I'm wrong.
Right now, I've got this partial working using antlr, however once I start doing the 'address' ones, the setText part seems to stop working for Attribute.
Am I doing this the correct way or is there a better way of using antlr to get the result I want?
grammar AttributeParser;
parse returns [ String result ]
: Address EOF { $result = $Address.text; }
| Attribute EOF { $result = $Attribute.text; }
| Varname EOF { $result = $Varname.text; }
;
Address
: 'address' (Arraypos)* '.' Attribute { setText("contactDetails.addresses" + ($Arraypos == null ? "[0]" : $Arraypos.text ) + "." + $Attribute.text); }
;
Attribute
: 'attribute.' Varname (Arraypos)* { setText("attributes(" + $Varname.text + ")" + ($Arraypos == null ? "[0]" : $Arraypos.text ) + ".value"); }
;
Arraypos
: '[' Number+ ']'
;
Varname
: ('a'..'z'|'A'..'Z')+
;
Number
: '0'..'9'+
;
Spaces
: (' ' | '\t' | '\r' | '\n')+ { setText(" "); }
;
Below are two unit tests, the first returns what I expect, the second doesn't.
#Test
public void testSimpleAttributeWithArrayRef() throws Exception {
String source = "attribute.name[2]";
ANTLRStringStream in = new ANTLRStringStream(source);
AttributeParserLexer lexer = new AttributeParserLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AttributeParserParser parser = new AttributeParserParser(tokens);
String result = parser.parse();
assertEquals("attributes(name)[2].value", result);
}
#Test
public void testAddress() throws Exception {
String source = "address.attribute.postalcode";
ANTLRStringStream in = new ANTLRStringStream(source);
AttributeParserLexer lexer = new AttributeParserLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AttributeParserParser parser = new AttributeParserParser(tokens);
String result = parser.parse();
System.out.println("Result: " + result);
assertEquals("contactDetails.addresses[0].attributes(postalcode)[0].value", result);
}
No, you can't do (Arraypos)* and then refer to the contents as this: $Arraypos.text.
I wouldn't go changing the inner text of the tokens, but create a couple of parser rules and let them return the appropriate text.
A little demo:
grammar AttributeParser;
parse returns [String s]
: input EOF {$s = $input.s;}
;
input returns [String s]
: address {$s = $address.s;}
| attribute {$s = $attribute.s;}
| Varname {$s = $Varname.text;}
;
address returns [String s]
: Address arrayPos '.' attribute
{$s = "contactDetails.addresses" + $arrayPos.s + "." + $attribute.s;}
;
attribute returns [String s]
: Attribute '.' Varname arrayPos
{$s = "attributes(" + $Varname.text + ")" + $arrayPos.s + ".value" ;}
;
arrayPos returns [String s]
: Arraypos {$s = $Arraypos.text;}
| /* nothing */ {$s = "[0]";}
;
Attribute : 'attribute';
Address : 'address';
Arraypos : '[' '0'..'9'+ ']';
Varname : ('a'..'z' | 'A'..'Z')+;
which can be tested with:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String[][] tests = {
{"name", "name"},
{"attribute.name", "attributes(name)[0].value"},
{"attribute.name[2]", "attributes(name)[2].value"},
{"address.attribute.postalcode", "contactDetails.addresses[0].attributes(postalcode)[0].value"},
{"address[2].attribute.postalcode", "contactDetails.addresses[2].attributes(postalcode)[0].value"},
{"address[2].attribute.postalcode[3]", "contactDetails.addresses[2].attributes(postalcode)[3].value"}
};
for(String[] test : tests) {
String input = test[0];
String expected = test[1];
AttributeParserLexer lexer = new AttributeParserLexer(new ANTLRStringStream(input));
AttributeParserParser parser = new AttributeParserParser(new CommonTokenStream(lexer));
String output = parser.parse();
if(!output.equals(expected)) {
throw new RuntimeException(output + " != " + expected);
}
System.out.printf("in = %s\nout = %s\n\n", input, output, expected);
}
}
}
And to run the demo do:
java -cp antlr-3.3.jar org.antlr.Tool AttributeParser.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
which will print the following to the console:
in = name
out = name
in = attribute.name
out = attributes(name)[0].value
in = attribute.name[2]
out = attributes(name)[2].value
in = address.attribute.postalcode
out = contactDetails.addresses[0].attributes(postalcode)[0].value
in = address[2].attribute.postalcode
out = contactDetails.addresses[2].attributes(postalcode)[0].value
in = address[2].attribute.postalcode[3]
out = contactDetails.addresses[2].attributes(postalcode)[3].value
EDIT
Note that you can also let parser rules return more than just one object like this:
bar
: foo {System.out.println($foo.text + ", " + $foo.number);}
;
foo returns [String text, int number]
: 'FOO' {$text = "a"; $number = 1;}
| 'foo' {$text = "b"; $number = 2;}
;
Related
I have actually two questions that I hope can be answered as they are semi-dependent on my work. Below is the grammar + tree grammar + Java test file.
What I am actually trying to achieve is the following:
Question 1:
I have a grammar that parses my language correctly. I would like to do some semantic checks on variable declarations. So I created a tree walker and so far it semi works. My problem is it's not capturing the whole string of expression. For example,
float x = 10 + 10;
It is only capturing the first part, i.e. 10. I am not sure what I am doing wrong. If I did it in one pass, it works. Somehow, if I split the work into a grammar and tree grammar, it is not capturing the whole string.
Question 2:
I would like to do a check on a rule such that if my conditions returns true, I would like to remove that subtree. For example,
float x = 10;
float x; // <================ I would like this to be removed.
I have tried using rewrite rules but I think it is more complex than that.
Test.g:
grammar Test;
options {
language = Java;
output = AST;
}
parse : varDeclare+
;
varDeclare : type id equalExp? ';'
;
equalExp : ('=' (expression | '...'))
;
expression : binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
TestTree.g:
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
parse[SemanticCheck s]
: varDeclare+
;
varDeclare : type id equalExp? ';'
{s.check($type.name, $id.text, $equalExp.expr);}
;
equalExp returns [String expr]
: ('=' (expression {$expr = $expression.e;} | '...' {$expr = "...";}))
;
expression returns [String e]
#after {$e = $expression.text;}
: binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type returns [String name]
#after { $name = $type.text; }
: 'int'
| 'float'
;
Java test file, Test.java:
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RuleReturnScope;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10+y; \n" +
"float x; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
//TestLexer lexer = new TestLexer(new ANTLRFileStream("input.txt"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
RuleReturnScope r = parser.parse();
System.out.println("Parse Tree:\n" + tokenStream.toString());
CommonTree t = (CommonTree)r.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
walker.parse(s);
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
System.out.println("Remove statement! Already defined!");
return true;
}
names.add(variableName);
return false;
}
}
Thanks in advance!
I figured out my problem and it turns out I needed to build an AST first before I can do anything. This would help in understanding what is a flat tree look like vs building an AST.
How to output the AST built using ANTLR?
Thanks to Bart's endless examples here in StackOverFlow, I was able to do semantic predicates to do what I needed in the example above.
Below is the updated code:
Test.g
grammar Test;
options {
language = Java;
output = AST;
}
tokens {
VARDECL;
Assign = '=';
EqT = '==';
NEq = '!=';
LT = '<';
LTEq = '<=';
GT = '>';
GTEq = '>=';
NOT = '!';
PLUS = '+';
MINUS = '-';
MULT = '*';
DIV = '/';
}
parse : varDeclare+
;
varDeclare : type id equalExp ';' -> ^(VARDECL type id equalExp)
;
equalExp : (Assign^ (expression | '...' ))
;
expression : binaryExpression
;
binaryExpression : addingExpression ((EqT|NEq|LTEq|GTEq|LT|GT)^ addingExpression)*
;
addingExpression : multiplyingExpression ((PLUS|MINUS)^ multiplyingExpression)*
;
multiplyingExpression : unaryExpression
((MULT|DIV)^ unaryExpression)*
;
unaryExpression: ((NOT|MINUS))^ primitiveElement
| primitiveElement
;
primitiveElement : literalExpression
| id
| '(' expression ')' -> expression
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
This should automatically build an AST whenever you have varDeclare. Now on to the tree grammar/walker.
TestTree.g
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
output = AST;
}
tokens {
REMOVED;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
start[SemanticCheck s] : varDeclare+
;
varDeclare : ^(VARDECL type id equalExp)
-> {s.check($type.text, $id.text, $equalExp.text)}? REMOVED
-> ^(VARDECL type id equalExp)
;
equalExp : ^(Assign expression)
| ^(Assign '...')
;
expression : ^(('!') expression)
| ^(('+'|'-'|'*'|'/') expression expression*)
| ^(('=='|'<='|'<'|'>='|'>'|'!=') expression expression*)
| literalExpression
;
literalExpression : INT
| id
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
Now on to test it:
Test.java
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10; \n" +
"int x = 1; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
TestParser.parse_return r = parser.parse();
System.out.println("Tree:" + ((Tree)r.tree).toStringTree() + "\n");
CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
TestTree.start_return r2 = walker.start(s);
System.out.println("\nTree Walker: "+((Tree)r2.tree).toStringTree());
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
return true;
}
names.add(variableName);
return false;
}
}
Output:
Tree:(VARDECL float x (= 10)) (VARDECL int x (= 1))
Type: float variableName: x exp: = 10
Type: int variableName: x exp: = 1
Tree Walker: (VARDECL float x (= 10)) REMOVED
Hope this helps! Please feel free to point any errors if I did something wrong.
I would like to know how to evaluate ifdef ... else ... end statement which could be put anywhere in a code. I start with a simple example who implement 2 basic function add(p1,p2) and diff(p1,p2) where p1 and p2 are Strings. It only adds a + or a - in between p1 and p2. Here is my grammar:
grammar ifdef;
options {
language = Java;
output = AST;
ASTLabelType=CommonTree;
}
tokens
{
EQUAL = '=' ;
HASH = '#' ;
DBLEQUOTE = '"' ;
SEMICOLON = ';' ;
}
#header {
package Grammar;
import java.util.Map;
import java.util.HashMap;
}
#lexer::header {
package Grammar;
}
#members {
private Map<String, String> strMapID = new HashMap<String, String>();
private Map<String, String> strMapDefine = new HashMap<String, String>();
}
rule returns [String strEval]
: { StringBuilder strBuilder = new StringBuilder(); }
( command
{ if ( $command.str != "" ) {
strBuilder.append( $command.str );
strBuilder.append( "\n" );
}
}
)+ EOF
{ $strEval = strBuilder.toString(); }
;
command returns [String str]
: define { $str=""; }
| undef { $str=""; }
| set { $str=""; }
| function { $str = $function.str; }
| ifdef { $str=""; }
;
define
: HASH 'define' ID
{ strMapDefine.put($ID.text, $ID.text); } // save define ID into hash table
;
undef
: HASH 'undef' ID
{ if ( strMapDefine.containsKey($ID.text) ) {
strMapDefine.remove($ID.text); // undef ID in hash table
}
}
;
set
: 'set' ID EQUAL string SEMICOLON
{ strMapID.put($ID.text, $string.text); } // save ID,string definition into hash table
;
string
: DBLEQUOTE expr DBLEQUOTE
;
function returns [String str]
: add { $str = $add.str; }
| diff { $str = $diff.str; }
;
add returns [String str]
: 'add' '(' p1=param ',' p2=param ')'
{ StringBuilder strBuilder = new StringBuilder();
strBuilder.append( $p1.str );
strBuilder.append( "+" );
strBuilder.append( $p2.str );
$str = strBuilder.toString();
}
;
diff returns [String str]
: 'diff' '(' p1=param ',' p2=param ')'
{ StringBuilder strBuilder = new StringBuilder();
strBuilder.append( $p1.str );
strBuilder.append( "-" );
strBuilder.append( $p2.str );
$str = strBuilder.toString();
}
;
param returns [String str]
: ID { $str = strMapID.get($ID.text); } // assign ID definition to str
| string { $str = $string.text; }
| function { $str = $function.str; }
;
ifdef
: { Boolean bElse= false;
StringBuilder strBuilder = new StringBuilder(); }
HASH 'ifdef' '(' ID ')' c1=expr ( HASH 'else' c2=expr { bElse= true; } )? HASH 'end'
{ if ( strMapDefine.containsKey($ID.text) ) {
strBuilder.append($c1.text);
}
else {
if ( bElse ) {
strBuilder.append($c2.text);
}
}
System.out.println("ifdef content is : " + strBuilder.toString() + "\n" );
}
;
expr
: .+
;
ID
: ('a'..'z' | 'A'..'Z')+
;
WS
: ( ' ' | '\t' | '\n' | '\r' ) {$channel=HIDDEN;}
;
and java main class
package Grammar;
import java.io.IOException;
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.TokenStream;
import Grammar.ifdefParser.rule_return;
public class mainifdef {
public static void main(String[] args) throws RecognitionException {
CharStream stream=null;
try {
stream = new ANTLRFileStream("src/input/test.txt");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ifdefLexer lexer = new ifdefLexer(stream);
TokenStream tokenStream = new CommonTokenStream(lexer);
ifdefParser parser = new ifdefParser(tokenStream);
rule_return evaluator = parser.rule();
System.out.println("Parsing Tree is \n" + evaluator.tree.toStringTree() + "\n");
System.out.println("Evaluation is \n" + evaluator.strEval + "\n");
}
}
With input file content like:
#undef Z
set AB = "toto";
set CD = "titi";
diff(CD,"essai")
#ifdef(Z) add( #end
add ( AB , diff( CD,"essai") )
The result console is as expected
ifdef content is :
Parsing Tree is
# undef Z set AB = " toto " ; set CD = " titi " ; diff ( CD , " essai " ) # ifdef ( Z ) add ( # end add ( AB , diff ( CD , " essai " ) )
Evaluation is
"titi"-"essai"
"toto"+"titi"-"essai"
My question is how to do to evaluate such input text which should give same result :
#undef Z
set AB = "toto";
set CD = "titi";
diff(CD,"essai")
#ifdef(Z) add( #end
add ( AB , #ifdef(Z) add( #else diff( #end CD,"essai") )
The result console is (which is normal ANTLR behavior):
ifdef content is :
src/input/test.txt line 8:11 no viable alternative at input '#'
src/input/test.txt line 8:43 missing EOF at 'CD'
ifdef content is : diff(
Parsing Tree is
# undef Z set AB = " toto " ; set CD = " titi " ; diff ( CD , " essai " ) # ifdef ( Z ) add ( # end add ( AB , < unexpected: [#59,94:94='#',< 6 >,8:11], resync=# > < missing ')'> # ifdef ( Z ) add ( # else diff ( # end < missing EOF >
Evaluation is
"titi"-"essai"
"toto"+null
Some guidance where to start will be much more appreciate
Regards JPM
I've been trying to learn ANTLR for some time and finally got my hands on The Definitive ANTLR reference.
Well I tried the following in ANTLRWorks 1.4
grammar Test;
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
expression
: INT ('+'^ INT)*;
When I pass 2+4 and process expression, I don't get a tree with + as the root and 2 and 4 as the child nodes. Rather, I get expression as the root and 2, + and 4 as child nodes at the same level.
Can't figure out what I am doing wrong. Need help desparately.
BTW how can I get those graphic descriptions ?
Yes, you get the expression because it's an expression that your only rule expression is returning.
I have just added a virtual token PLUS to your example along with a rewrite expression that show the result your are expecting.
But it seems that you have already found the solution :o)
grammar Test;
options {
output=AST;
ASTLabelType = CommonTree;
}
tokens {PLUS;}
#members {
public static void main(String [] args) {
try {
TestLexer lexer =
new TestLexer(new ANTLRStringStream("2+2"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.expression_return p_result = parser.expression();
CommonTree ast = p_result.tree;
if( ast == null ) {
System.out.println("resultant tree: is NULL");
} else {
System.out.println("resultant tree: " + ast.toStringTree());
}
} catch(Exception e) {
e.printStackTrace();
}
}
}
expression
: INT ('+' INT)* -> ^(PLUS INT+);
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
I'm playing a bit around with ANTLR, and wish to create a function like this:
MOVE x y z pitch roll
That produces the following AST:
MOVE
|---x
|---y
|---z
|---pitch
|---roll
So far I've tried without luck, and I keep getting the AST to have the parameters as siblings, rather than children.
Code so far:
C#:
class Program
{
const string CRLF = "\r\n";
static void Main(string[] args)
{
string filename = "Script.txt";
var reader = new StreamReader(filename);
var input = new ANTLRReaderStream(reader);
var lexer = new ScorBotScriptLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new ScorBotScriptParser(tokens);
var result = parser.program();
var tree = result.Tree as CommonTree;
Print(tree, "");
Console.Read();
}
static void Print(CommonTree tree, string indent)
{
Console.WriteLine(indent + tree.ToString());
if (tree.Children != null)
{
indent += "\t";
foreach (var child in tree.Children)
{
var childTree = child as CommonTree;
if (childTree.Text != CRLF)
{
Print(childTree, indent);
}
}
}
}
ANTLR:
grammar ScorBotScript;
options
{
language = 'CSharp2';
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
memoize = true;
}
#parser::namespace { RSD.Scripting }
#lexer::namespace { RSD.Scripting }
program
: (robotInstruction CRLF)*
;
robotInstruction
: moveCoordinatesInstruction
;
/**
* MOVE X Y Z PITCH ROLL
*/
moveCoordinatesInstruction
: 'MOVE' x=INT y=INT z=INT pitch=INT roll=INT
;
INT : '-'? ( '0'..'9' )*
;
COMMENT
: '//' ~( CR | LF )* CR? LF { $channel = HIDDEN; }
;
WS
: ( ' ' | TAB | CR | LF ) { $channel = HIDDEN; }
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
fragment TAB
: '\t'
;
fragment CR
: '\r'
;
fragment LF
: '\n'
;
CRLF
: (CR ? LF) => CR ? LF
| CR
;
parse
: ID
| INT
| COMMENT
| STRING
| WS
;
I'm a beginner with ANTLR myself, this confused me too.
I think if you want to create a tree from your grammar that has structure, you augment your grammar with hints using the ^ and ! characters. This examples page shows how.
From the linked page:
By default ANTLR creates trees as
"sibling lists".
The grammar must be annotated to with
tree commands to produce a parser that
creates trees in the correct shape
(that is, operators at the root, which
operands as children). A somewhat more
complicated expression parser can be
seen here and downloaded in tar form
here. Note that grammar terminals
which should be at the root of a
sub-tree are annotated with ^.
I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this:
1 2 / 3 4
I want the AST to be
(* (/ (* 1 2) 3) 4)
I've hit upon the following, which uses java code to create the appropriate nodes:
grammar TestProd;
options {
output = AST;
}
tokens {
PROD;
}
DIV : '/';
multExpr: (INTEGER -> INTEGER)
( {div = null;}
div=DIV? b=INTEGER
->
^({$div == null ? (Object)adaptor.create(PROD, "*") : (Object)adaptor.create(DIV, "/")}
$multExpr $b))*
;
INTEGER: ('0' | '1'..'9' '0'..'9'*);
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
This works. But is there a better/simpler way?
Here's a way:
grammar Test;
options {
backtrack=true;
output=AST;
}
tokens {
MUL;
DIV;
}
parse
: expr* EOF
;
expr
: (atom -> atom)
( '/' a=atom -> ^(DIV $expr $a)
| a=atom -> ^(MUL $expr $a)
)*
;
atom
: Number
| '(' expr ')' -> expr
;
Number
: '0'..'9'+
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String[] args) throws Exception {
String source = "1 2 / 3 4";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.parse_return result = parser.parse();
Tree tree = (Tree)result.getTree();
System.out.println(tree.toStringTree());
}
}
produced:
(MUL (DIV (MUL 1 2) 3) 4)