I have actually two questions that I hope can be answered as they are semi-dependent on my work. Below is the grammar + tree grammar + Java test file.
What I am actually trying to achieve is the following:
Question 1:
I have a grammar that parses my language correctly. I would like to do some semantic checks on variable declarations. So I created a tree walker and so far it semi works. My problem is it's not capturing the whole string of expression. For example,
float x = 10 + 10;
It is only capturing the first part, i.e. 10. I am not sure what I am doing wrong. If I did it in one pass, it works. Somehow, if I split the work into a grammar and tree grammar, it is not capturing the whole string.
Question 2:
I would like to do a check on a rule such that if my conditions returns true, I would like to remove that subtree. For example,
float x = 10;
float x; // <================ I would like this to be removed.
I have tried using rewrite rules but I think it is more complex than that.
Test.g:
grammar Test;
options {
language = Java;
output = AST;
}
parse : varDeclare+
;
varDeclare : type id equalExp? ';'
;
equalExp : ('=' (expression | '...'))
;
expression : binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
TestTree.g:
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
parse[SemanticCheck s]
: varDeclare+
;
varDeclare : type id equalExp? ';'
{s.check($type.name, $id.text, $equalExp.expr);}
;
equalExp returns [String expr]
: ('=' (expression {$expr = $expression.e;} | '...' {$expr = "...";}))
;
expression returns [String e]
#after {$e = $expression.text;}
: binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type returns [String name]
#after { $name = $type.text; }
: 'int'
| 'float'
;
Java test file, Test.java:
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RuleReturnScope;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10+y; \n" +
"float x; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
//TestLexer lexer = new TestLexer(new ANTLRFileStream("input.txt"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
RuleReturnScope r = parser.parse();
System.out.println("Parse Tree:\n" + tokenStream.toString());
CommonTree t = (CommonTree)r.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
walker.parse(s);
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
System.out.println("Remove statement! Already defined!");
return true;
}
names.add(variableName);
return false;
}
}
Thanks in advance!
I figured out my problem and it turns out I needed to build an AST first before I can do anything. This would help in understanding what is a flat tree look like vs building an AST.
How to output the AST built using ANTLR?
Thanks to Bart's endless examples here in StackOverFlow, I was able to do semantic predicates to do what I needed in the example above.
Below is the updated code:
Test.g
grammar Test;
options {
language = Java;
output = AST;
}
tokens {
VARDECL;
Assign = '=';
EqT = '==';
NEq = '!=';
LT = '<';
LTEq = '<=';
GT = '>';
GTEq = '>=';
NOT = '!';
PLUS = '+';
MINUS = '-';
MULT = '*';
DIV = '/';
}
parse : varDeclare+
;
varDeclare : type id equalExp ';' -> ^(VARDECL type id equalExp)
;
equalExp : (Assign^ (expression | '...' ))
;
expression : binaryExpression
;
binaryExpression : addingExpression ((EqT|NEq|LTEq|GTEq|LT|GT)^ addingExpression)*
;
addingExpression : multiplyingExpression ((PLUS|MINUS)^ multiplyingExpression)*
;
multiplyingExpression : unaryExpression
((MULT|DIV)^ unaryExpression)*
;
unaryExpression: ((NOT|MINUS))^ primitiveElement
| primitiveElement
;
primitiveElement : literalExpression
| id
| '(' expression ')' -> expression
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
This should automatically build an AST whenever you have varDeclare. Now on to the tree grammar/walker.
TestTree.g
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
output = AST;
}
tokens {
REMOVED;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
start[SemanticCheck s] : varDeclare+
;
varDeclare : ^(VARDECL type id equalExp)
-> {s.check($type.text, $id.text, $equalExp.text)}? REMOVED
-> ^(VARDECL type id equalExp)
;
equalExp : ^(Assign expression)
| ^(Assign '...')
;
expression : ^(('!') expression)
| ^(('+'|'-'|'*'|'/') expression expression*)
| ^(('=='|'<='|'<'|'>='|'>'|'!=') expression expression*)
| literalExpression
;
literalExpression : INT
| id
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
Now on to test it:
Test.java
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10; \n" +
"int x = 1; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
TestParser.parse_return r = parser.parse();
System.out.println("Tree:" + ((Tree)r.tree).toStringTree() + "\n");
CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
TestTree.start_return r2 = walker.start(s);
System.out.println("\nTree Walker: "+((Tree)r2.tree).toStringTree());
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
return true;
}
names.add(variableName);
return false;
}
}
Output:
Tree:(VARDECL float x (= 10)) (VARDECL int x (= 1))
Type: float variableName: x exp: = 10
Type: int variableName: x exp: = 1
Tree Walker: (VARDECL float x (= 10)) REMOVED
Hope this helps! Please feel free to point any errors if I did something wrong.
Related
I am using the following ANTLR grammar to define a function.
definition_function
: DEFINE FUNCTION function_name '[' language_name ']'
RETURN attribute_type '{' function_body '}'
;
function_name
: id
;
language_name
: id
;
function_body
: SCRIPT
;
SCRIPT
: '{' ('\u0020'..'\u007e' | ~( '{' | '}' ) )* '}'
{ setText(getText().substring(1, getText().length()-1)); }
;
But when I try to parse two functions like below,
define function concat[Scala] return string {
var concatenatedString = ""
for(i <- 0 until data.length) {
concatenatedString += data(i).toString
}
concatenatedString
};
define function concat[JavaScript] return string {
var str1 = data[0];
var str2 = data[1];
var str3 = data[2];
var res = str1.concat(str2,str3);
return res;
};
Then ANTLR doesn't parse this like two function definitions, but like a single function with the following body,
var concatenatedString = ""
for(i <- 0 until data.length) {
concatenatedString += data(i).toString
}
concatenatedString
};
define function concat[JavaScript] return string {
var str1 = data[0];
var str2 = data[1];
var str3 = data[2];
var res = str1.concat(str2,str3);
return res;
Can you explain this behavior? The body of the function can have anything in it. How can I correctly define this grammar?
Your rule matches that much because '\u0020'..'\u007e' from the rule '{' ('\u0020'..'\u007e' | ~( '{' | '}' ) )* '}' matches both { and }.
Your rule should work if you define it like this:
SCRIPT
: '{' ( SCRIPT | ~( '{' | '}' ) )* '}'
;
However, this will fail when the script block contains, says, strings or comments that contain { or }. Here is a way to match a SCRIPT token, including comments and string literals that could contain { and '}':
SCRIPT
: '{' SCRIPT_ATOM* '}'
;
fragment SCRIPT_ATOM
: ~[{}]
| '"' ~["]* '"'
| '//' ~[\r\n]*
| SCRIPT
;
A complete grammar that properly parses your input would then look like this:
grammar T;
parse
: definition_function* EOF
;
definition_function
: DEFINE FUNCTION function_name '[' language_name ']' RETURN attribute_type SCRIPT ';'
;
function_name
: ID
;
language_name
: ID
;
attribute_type
: ID
;
DEFINE
: 'define'
;
FUNCTION
: 'function'
;
RETURN
: 'return'
;
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
SCRIPT
: '{' SCRIPT_ATOM* '}'
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment SCRIPT_ATOM
: ~[{}]
| '"' ~["]* '"'
| '//' ~[\r\n]*
| SCRIPT
;
which also parses the following input properly:
define function concat[JavaScript] return string {
for (;;) {
while (true) { }
}
var s = "}"
// }
return s
};
Unless you absolutely need SCRIPT to be a token (recognized by a lexer rule), you can use a parser rule which recognizes nested blocks (the block rule below). The grammar included here should parse your example as two distinct function definitions.
DEFINE : 'define';
FUNCTION : 'function';
RETURN : 'return';
ID : [A-Za-z]+;
ANY : . ;
WS : [ \r\t\n]+ -> skip ;
test : definition_function* ;
definition_function
: DEFINE FUNCTION function_name '[' language_name ']'
RETURN attribute_type block ';'
;
function_name : id ;
language_name : id ;
attribute_type : 'string' ;
id : ID;
block
: '{' ( ( ~('{'|'}') )+ | block)* '}'
;
I'm trying to generate a Lexer/Parser through a simple grammar using ANTLR. What I've done right now :
grammar ExprV2;
#header {
package mypack.parte2;
}
#lexer::header {
package mypack.parte2;
}
start
: expr EOF { System.out.println($expr.val);}
;
expr returns [int val]
: term e=exprP[$term.val] { $val = $e.val; }
;
exprP[int i] returns [int val]
: { $val = $i; }
| '+' term e=exprP[$i + $term.val] { $val = $e.val; }
| '-' term e=exprP[$i - $term.val] { $val = $e.val; }
;
term returns [int val]
: fact e=termP[$fact.val] { $val = $e.val; }
;
termP[int i] returns [int val]
: {$val = $i;}
| '*' fact e=termP[$i * $fact.val] {$val = $e.val; }
| '/' fact e=termP[$i / $fact.val] {$val = $e.val; }
;
fact returns [int val]
: '(' expr ')' { $val = $expr.val; }
| NUM { $val=Integer.parseInt($NUM.text); }
;
NUM : '0'..'9'+ ;
WS : (' ' | '\t' |'\r' | '\n')+ { skip(); };
What I would like to obtain is to generate a Lexer/Parser with a grammar written using EBNF but I'm stucked and I don't know how to go ahead. I looked on the internet but I did not succeed in finding anything clear. Thanks to all!
I'm trying to do the following rewrite of the multiplication operator as repeated additions:
(* a t=INT) -> (+ a (+ a (+ a (+ ... + a) ... )) (t times)
Is there a way to do this in a single pass in ANTLR using a tree rewrite rule?
If not, what is the best way to go about it?
I have to do this rewriting multiple times, for each occurrence of '*', and the corresponding t's are parsed. Therefore, there is no fixed bound on the t's.
I managed to solve the problem in multiple passes. I compute the max number of passes while parsing the expression and apply the tree rewrite rules multiple times. I don't even need backtrack to be true. See code below.
Expr.g -> lexer, parser grammar
grammar Expr;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
MULT='*';
ADD='+';
}
#header{
import java.lang.Math;
}
#members {
public int limit=0;
}
prog : expr {limit=$expr.value;} ;
expr returns [int value]
: a=multExpr {$value=$a.value;} (ADD^ b=multExpr {$value=Math.max($value, $b.value);})* ;
multExpr returns [int value]
: primary {$value=$primary.value;} (MULT^ c=INT {$value=Math.max($value, $c.int);})? ;
primary returns[int value]
: ID {$value = 0;}
| '('! expr ')'! {$value = $expr.value;}
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
WS : (' '|'\r'|'\n')+ {skip();} ;
Eval.g -> tree rewrite grammar with main program
tree grammar Eval;
options {
tokenVocab=Expr;
ASTLabelType=CommonTree;
output=AST;
}
#members {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
ExprLexer lexer = new ExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExprParser parser = new ExprParser(tokens);
CommonTree t = null;
try {
t = (CommonTree) parser.prog().getTree();
} catch(RecognitionException re){
re.printStackTrace();
}
System.out.println("Tree: " + t.toStringTree());
System.out.println();
int loops = parser.limit;
System.out.println("Number of loops:" + loops);
System.out.println();
for(int i=0; i<loops; i++) {
System.out.println("Loop:" + (i+1));
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Eval s = new Eval(nodes);
t = (CommonTree)s.prog().getTree();
System.out.println("Simplified tree: "+t.toStringTree());
System.out.println();
}
}
}
prog : expr ;
expr
: ^(ADD a=expr b=expr)
| ^(MULT a=expr t=INT) ( {$t.int>1}?=> -> ^(ADD["+"] $a ^(MULT["*"] $a INT[String.valueOf($t.int - 1)]))
| {$t.int==1}?=> -> $a )
| INT
| ID
;
I'm playing a bit around with ANTLR, and wish to create a function like this:
MOVE x y z pitch roll
That produces the following AST:
MOVE
|---x
|---y
|---z
|---pitch
|---roll
So far I've tried without luck, and I keep getting the AST to have the parameters as siblings, rather than children.
Code so far:
C#:
class Program
{
const string CRLF = "\r\n";
static void Main(string[] args)
{
string filename = "Script.txt";
var reader = new StreamReader(filename);
var input = new ANTLRReaderStream(reader);
var lexer = new ScorBotScriptLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new ScorBotScriptParser(tokens);
var result = parser.program();
var tree = result.Tree as CommonTree;
Print(tree, "");
Console.Read();
}
static void Print(CommonTree tree, string indent)
{
Console.WriteLine(indent + tree.ToString());
if (tree.Children != null)
{
indent += "\t";
foreach (var child in tree.Children)
{
var childTree = child as CommonTree;
if (childTree.Text != CRLF)
{
Print(childTree, indent);
}
}
}
}
ANTLR:
grammar ScorBotScript;
options
{
language = 'CSharp2';
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
memoize = true;
}
#parser::namespace { RSD.Scripting }
#lexer::namespace { RSD.Scripting }
program
: (robotInstruction CRLF)*
;
robotInstruction
: moveCoordinatesInstruction
;
/**
* MOVE X Y Z PITCH ROLL
*/
moveCoordinatesInstruction
: 'MOVE' x=INT y=INT z=INT pitch=INT roll=INT
;
INT : '-'? ( '0'..'9' )*
;
COMMENT
: '//' ~( CR | LF )* CR? LF { $channel = HIDDEN; }
;
WS
: ( ' ' | TAB | CR | LF ) { $channel = HIDDEN; }
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
fragment TAB
: '\t'
;
fragment CR
: '\r'
;
fragment LF
: '\n'
;
CRLF
: (CR ? LF) => CR ? LF
| CR
;
parse
: ID
| INT
| COMMENT
| STRING
| WS
;
I'm a beginner with ANTLR myself, this confused me too.
I think if you want to create a tree from your grammar that has structure, you augment your grammar with hints using the ^ and ! characters. This examples page shows how.
From the linked page:
By default ANTLR creates trees as
"sibling lists".
The grammar must be annotated to with
tree commands to produce a parser that
creates trees in the correct shape
(that is, operators at the root, which
operands as children). A somewhat more
complicated expression parser can be
seen here and downloaded in tar form
here. Note that grammar terminals
which should be at the root of a
sub-tree are annotated with ^.
I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this:
1 2 / 3 4
I want the AST to be
(* (/ (* 1 2) 3) 4)
I've hit upon the following, which uses java code to create the appropriate nodes:
grammar TestProd;
options {
output = AST;
}
tokens {
PROD;
}
DIV : '/';
multExpr: (INTEGER -> INTEGER)
( {div = null;}
div=DIV? b=INTEGER
->
^({$div == null ? (Object)adaptor.create(PROD, "*") : (Object)adaptor.create(DIV, "/")}
$multExpr $b))*
;
INTEGER: ('0' | '1'..'9' '0'..'9'*);
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
This works. But is there a better/simpler way?
Here's a way:
grammar Test;
options {
backtrack=true;
output=AST;
}
tokens {
MUL;
DIV;
}
parse
: expr* EOF
;
expr
: (atom -> atom)
( '/' a=atom -> ^(DIV $expr $a)
| a=atom -> ^(MUL $expr $a)
)*
;
atom
: Number
| '(' expr ')' -> expr
;
Number
: '0'..'9'+
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String[] args) throws Exception {
String source = "1 2 / 3 4";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.parse_return result = parser.parse();
Tree tree = (Tree)result.getTree();
System.out.println(tree.toStringTree());
}
}
produced:
(MUL (DIV (MUL 1 2) 3) 4)