I have a grammar that contains a rule like this:
stmt -> ID := expr
| print( expr )
| if( expr ) then ( stmt ) [ else stmt ]?
| while( expr ) do stmt
| begin stmt [ ; stmt ]* end
I don't know how to translate the rule WHILE to bytecode. For now, I wrote this:
stmt
: ID ':''=' expr
{
if(st.lookupType($ID.text) != $expr.type) {
throw new IllegalArgumentException("Type error on variable: " + $ID.text + ".");
}
int var = st.lookupAddress($ID.text);
code.emit(Opcode.ISTORE, var);
}
| 'print' '(' expr ')'
{
if($expr.type == Type.INTEGER) {
code.emit(Opcode.PRINT);
}
if($expr.type == Type.BOOLEAN) {
code.emit(Opcode.BPRINT);
}
}
| 'if' expr
{
if($expr.type != Type.BOOLEAN)
throw new IllegalArgumentException("Type error in '( expr )': expr is not a boolean.");
int ltrue = code.newLabel();
}
'then'
{
code.emit(Opcode.LABEL, ltrue);
}
s1 = stmt ( 'else' s2 = stmt )?
| 'while' expr
{
if($expr.type != Type.BOOLEAN)
throw new IllegalArgumentException("Type error in '( expr )': expr is not a boolean.");
//Bytecode generator
// CODE...
}
'do'
(
s1 = stmt
{
int ltrue = code.newLabel();
}
)*
| 'begin' s1 = stmt ( ';' s2 = stmt )* 'end'
;
st is a symbol table, that is, a table that contains the type of a certain variable.
In my grammar can be only two types: INTEGER or BOOLEAN.
The method newLabel() does nothing more than create a new label that is a string such as L1, L2, L3, etc., simply incrementing a counter.
code is an instance of CodeGenerator that is a data structure (Vector <Instruction>) that stores the bytecode instructions generated during the parsing of a program.
The Instruction class is composed of two fields (opcode and operand) and represents a single byte code instruction.
How do I generate the bytecode of a loop such as WHILE?
Thanks
Modified code:
'while'
{
int lloop = code.newLabel(); //label for loop
int ldone = code.newLabel(); //laber for done
code.emit(Opcode.LABEL, lloop);
}
expr
{
if($expr.type != Type.BOOLEAN)
throw new IllegalArgumentException("Type error in '( expr )': expr is not a boolean.");
code.emit(Opcode.GOTO, ldone);
}
'do' ( stmt )*
code.emit(Opcode.GOTO, lloop);
code.emit(Opcode.LABEL, ldone);
}
The simplest expression of a while loop is:
LOOP:
<condition>
jmp_if_false DONE;
<body>
jmp LOOP
DONE:
which might translate in pseudocode into:
'while' {
Create loop_label
Create done_label
Emit Label loop_label
}
expr {
Emit Jump_If_False to done_label
}
'do' stmt* {
Emit Jump to loop_label
Emit Label done_label
}
There are a variety of possible optimizations, but that's a start at least.
Related
I am using the following ANTLR grammar to define a function.
definition_function
: DEFINE FUNCTION function_name '[' language_name ']'
RETURN attribute_type '{' function_body '}'
;
function_name
: id
;
language_name
: id
;
function_body
: SCRIPT
;
SCRIPT
: '{' ('\u0020'..'\u007e' | ~( '{' | '}' ) )* '}'
{ setText(getText().substring(1, getText().length()-1)); }
;
But when I try to parse two functions like below,
define function concat[Scala] return string {
var concatenatedString = ""
for(i <- 0 until data.length) {
concatenatedString += data(i).toString
}
concatenatedString
};
define function concat[JavaScript] return string {
var str1 = data[0];
var str2 = data[1];
var str3 = data[2];
var res = str1.concat(str2,str3);
return res;
};
Then ANTLR doesn't parse this like two function definitions, but like a single function with the following body,
var concatenatedString = ""
for(i <- 0 until data.length) {
concatenatedString += data(i).toString
}
concatenatedString
};
define function concat[JavaScript] return string {
var str1 = data[0];
var str2 = data[1];
var str3 = data[2];
var res = str1.concat(str2,str3);
return res;
Can you explain this behavior? The body of the function can have anything in it. How can I correctly define this grammar?
Your rule matches that much because '\u0020'..'\u007e' from the rule '{' ('\u0020'..'\u007e' | ~( '{' | '}' ) )* '}' matches both { and }.
Your rule should work if you define it like this:
SCRIPT
: '{' ( SCRIPT | ~( '{' | '}' ) )* '}'
;
However, this will fail when the script block contains, says, strings or comments that contain { or }. Here is a way to match a SCRIPT token, including comments and string literals that could contain { and '}':
SCRIPT
: '{' SCRIPT_ATOM* '}'
;
fragment SCRIPT_ATOM
: ~[{}]
| '"' ~["]* '"'
| '//' ~[\r\n]*
| SCRIPT
;
A complete grammar that properly parses your input would then look like this:
grammar T;
parse
: definition_function* EOF
;
definition_function
: DEFINE FUNCTION function_name '[' language_name ']' RETURN attribute_type SCRIPT ';'
;
function_name
: ID
;
language_name
: ID
;
attribute_type
: ID
;
DEFINE
: 'define'
;
FUNCTION
: 'function'
;
RETURN
: 'return'
;
ID
: [a-zA-Z_] [a-zA-Z_0-9]*
;
SCRIPT
: '{' SCRIPT_ATOM* '}'
;
SPACES
: [ \t\r\n]+ -> skip
;
fragment SCRIPT_ATOM
: ~[{}]
| '"' ~["]* '"'
| '//' ~[\r\n]*
| SCRIPT
;
which also parses the following input properly:
define function concat[JavaScript] return string {
for (;;) {
while (true) { }
}
var s = "}"
// }
return s
};
Unless you absolutely need SCRIPT to be a token (recognized by a lexer rule), you can use a parser rule which recognizes nested blocks (the block rule below). The grammar included here should parse your example as two distinct function definitions.
DEFINE : 'define';
FUNCTION : 'function';
RETURN : 'return';
ID : [A-Za-z]+;
ANY : . ;
WS : [ \r\t\n]+ -> skip ;
test : definition_function* ;
definition_function
: DEFINE FUNCTION function_name '[' language_name ']'
RETURN attribute_type block ';'
;
function_name : id ;
language_name : id ;
attribute_type : 'string' ;
id : ID;
block
: '{' ( ( ~('{'|'}') )+ | block)* '}'
;
I'm trying to generate a Lexer/Parser through a simple grammar using ANTLR. What I've done right now :
grammar ExprV2;
#header {
package mypack.parte2;
}
#lexer::header {
package mypack.parte2;
}
start
: expr EOF { System.out.println($expr.val);}
;
expr returns [int val]
: term e=exprP[$term.val] { $val = $e.val; }
;
exprP[int i] returns [int val]
: { $val = $i; }
| '+' term e=exprP[$i + $term.val] { $val = $e.val; }
| '-' term e=exprP[$i - $term.val] { $val = $e.val; }
;
term returns [int val]
: fact e=termP[$fact.val] { $val = $e.val; }
;
termP[int i] returns [int val]
: {$val = $i;}
| '*' fact e=termP[$i * $fact.val] {$val = $e.val; }
| '/' fact e=termP[$i / $fact.val] {$val = $e.val; }
;
fact returns [int val]
: '(' expr ')' { $val = $expr.val; }
| NUM { $val=Integer.parseInt($NUM.text); }
;
NUM : '0'..'9'+ ;
WS : (' ' | '\t' |'\r' | '\n')+ { skip(); };
What I would like to obtain is to generate a Lexer/Parser with a grammar written using EBNF but I'm stucked and I don't know how to go ahead. I looked on the internet but I did not succeed in finding anything clear. Thanks to all!
I'm trying to do the following rewrite of the multiplication operator as repeated additions:
(* a t=INT) -> (+ a (+ a (+ a (+ ... + a) ... )) (t times)
Is there a way to do this in a single pass in ANTLR using a tree rewrite rule?
If not, what is the best way to go about it?
I have to do this rewriting multiple times, for each occurrence of '*', and the corresponding t's are parsed. Therefore, there is no fixed bound on the t's.
I managed to solve the problem in multiple passes. I compute the max number of passes while parsing the expression and apply the tree rewrite rules multiple times. I don't even need backtrack to be true. See code below.
Expr.g -> lexer, parser grammar
grammar Expr;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
MULT='*';
ADD='+';
}
#header{
import java.lang.Math;
}
#members {
public int limit=0;
}
prog : expr {limit=$expr.value;} ;
expr returns [int value]
: a=multExpr {$value=$a.value;} (ADD^ b=multExpr {$value=Math.max($value, $b.value);})* ;
multExpr returns [int value]
: primary {$value=$primary.value;} (MULT^ c=INT {$value=Math.max($value, $c.int);})? ;
primary returns[int value]
: ID {$value = 0;}
| '('! expr ')'! {$value = $expr.value;}
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
WS : (' '|'\r'|'\n')+ {skip();} ;
Eval.g -> tree rewrite grammar with main program
tree grammar Eval;
options {
tokenVocab=Expr;
ASTLabelType=CommonTree;
output=AST;
}
#members {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
ExprLexer lexer = new ExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExprParser parser = new ExprParser(tokens);
CommonTree t = null;
try {
t = (CommonTree) parser.prog().getTree();
} catch(RecognitionException re){
re.printStackTrace();
}
System.out.println("Tree: " + t.toStringTree());
System.out.println();
int loops = parser.limit;
System.out.println("Number of loops:" + loops);
System.out.println();
for(int i=0; i<loops; i++) {
System.out.println("Loop:" + (i+1));
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Eval s = new Eval(nodes);
t = (CommonTree)s.prog().getTree();
System.out.println("Simplified tree: "+t.toStringTree());
System.out.println();
}
}
}
prog : expr ;
expr
: ^(ADD a=expr b=expr)
| ^(MULT a=expr t=INT) ( {$t.int>1}?=> -> ^(ADD["+"] $a ^(MULT["*"] $a INT[String.valueOf($t.int - 1)]))
| {$t.int==1}?=> -> $a )
| INT
| ID
;
I have actually two questions that I hope can be answered as they are semi-dependent on my work. Below is the grammar + tree grammar + Java test file.
What I am actually trying to achieve is the following:
Question 1:
I have a grammar that parses my language correctly. I would like to do some semantic checks on variable declarations. So I created a tree walker and so far it semi works. My problem is it's not capturing the whole string of expression. For example,
float x = 10 + 10;
It is only capturing the first part, i.e. 10. I am not sure what I am doing wrong. If I did it in one pass, it works. Somehow, if I split the work into a grammar and tree grammar, it is not capturing the whole string.
Question 2:
I would like to do a check on a rule such that if my conditions returns true, I would like to remove that subtree. For example,
float x = 10;
float x; // <================ I would like this to be removed.
I have tried using rewrite rules but I think it is more complex than that.
Test.g:
grammar Test;
options {
language = Java;
output = AST;
}
parse : varDeclare+
;
varDeclare : type id equalExp? ';'
;
equalExp : ('=' (expression | '...'))
;
expression : binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
TestTree.g:
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
parse[SemanticCheck s]
: varDeclare+
;
varDeclare : type id equalExp? ';'
{s.check($type.name, $id.text, $equalExp.expr);}
;
equalExp returns [String expr]
: ('=' (expression {$expr = $expression.e;} | '...' {$expr = "...";}))
;
expression returns [String e]
#after {$e = $expression.text;}
: binaryExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
type returns [String name]
#after { $name = $type.text; }
: 'int'
| 'float'
;
Java test file, Test.java:
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RuleReturnScope;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10+y; \n" +
"float x; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
//TestLexer lexer = new TestLexer(new ANTLRFileStream("input.txt"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
RuleReturnScope r = parser.parse();
System.out.println("Parse Tree:\n" + tokenStream.toString());
CommonTree t = (CommonTree)r.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
walker.parse(s);
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
System.out.println("Remove statement! Already defined!");
return true;
}
names.add(variableName);
return false;
}
}
Thanks in advance!
I figured out my problem and it turns out I needed to build an AST first before I can do anything. This would help in understanding what is a flat tree look like vs building an AST.
How to output the AST built using ANTLR?
Thanks to Bart's endless examples here in StackOverFlow, I was able to do semantic predicates to do what I needed in the example above.
Below is the updated code:
Test.g
grammar Test;
options {
language = Java;
output = AST;
}
tokens {
VARDECL;
Assign = '=';
EqT = '==';
NEq = '!=';
LT = '<';
LTEq = '<=';
GT = '>';
GTEq = '>=';
NOT = '!';
PLUS = '+';
MINUS = '-';
MULT = '*';
DIV = '/';
}
parse : varDeclare+
;
varDeclare : type id equalExp ';' -> ^(VARDECL type id equalExp)
;
equalExp : (Assign^ (expression | '...' ))
;
expression : binaryExpression
;
binaryExpression : addingExpression ((EqT|NEq|LTEq|GTEq|LT|GT)^ addingExpression)*
;
addingExpression : multiplyingExpression ((PLUS|MINUS)^ multiplyingExpression)*
;
multiplyingExpression : unaryExpression
((MULT|DIV)^ unaryExpression)*
;
unaryExpression: ((NOT|MINUS))^ primitiveElement
| primitiveElement
;
primitiveElement : literalExpression
| id
| '(' expression ')' -> expression
;
literalExpression : INT
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
This should automatically build an AST whenever you have varDeclare. Now on to the tree grammar/walker.
TestTree.g
tree grammar TestTree;
options {
language = Java;
tokenVocab = Test;
ASTLabelType = CommonTree;
output = AST;
}
tokens {
REMOVED;
}
#members {
SemanticCheck s;
public TestTree(TreeNodeStream input, SemanticCheck s) {
this(input);
this.s = s;
}
}
start[SemanticCheck s] : varDeclare+
;
varDeclare : ^(VARDECL type id equalExp)
-> {s.check($type.text, $id.text, $equalExp.text)}? REMOVED
-> ^(VARDECL type id equalExp)
;
equalExp : ^(Assign expression)
| ^(Assign '...')
;
expression : ^(('!') expression)
| ^(('+'|'-'|'*'|'/') expression expression*)
| ^(('=='|'<='|'<'|'>='|'>'|'!=') expression expression*)
| literalExpression
;
literalExpression : INT
| id
;
id : IDENTIFIER
;
type : 'int'
| 'float'
;
Now on to test it:
Test.java
import java.util.ArrayList;
import java.util.List;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
SemanticCheck s = new SemanticCheck();
String src =
"float x = 10; \n" +
"int x = 1; \n";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
TestParser.parse_return r = parser.parse();
System.out.println("Tree:" + ((Tree)r.tree).toStringTree() + "\n");
CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)r.tree);
nodes.setTokenStream(tokenStream);
TestTree walker = new TestTree(nodes, s);
TestTree.start_return r2 = walker.start(s);
System.out.println("\nTree Walker: "+((Tree)r2.tree).toStringTree());
}
}
class SemanticCheck {
List<String> names;
public SemanticCheck() {
this.names = new ArrayList<String>();
}
public boolean check(String type, String variableName, String exp) {
System.out.println("Type: " + type + " variableName: " + variableName + " exp: " + exp);
if(names.contains(variableName)) {
return true;
}
names.add(variableName);
return false;
}
}
Output:
Tree:(VARDECL float x (= 10)) (VARDECL int x (= 1))
Type: float variableName: x exp: = 10
Type: int variableName: x exp: = 1
Tree Walker: (VARDECL float x (= 10)) REMOVED
Hope this helps! Please feel free to point any errors if I did something wrong.
I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this:
1 2 / 3 4
I want the AST to be
(* (/ (* 1 2) 3) 4)
I've hit upon the following, which uses java code to create the appropriate nodes:
grammar TestProd;
options {
output = AST;
}
tokens {
PROD;
}
DIV : '/';
multExpr: (INTEGER -> INTEGER)
( {div = null;}
div=DIV? b=INTEGER
->
^({$div == null ? (Object)adaptor.create(PROD, "*") : (Object)adaptor.create(DIV, "/")}
$multExpr $b))*
;
INTEGER: ('0' | '1'..'9' '0'..'9'*);
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
This works. But is there a better/simpler way?
Here's a way:
grammar Test;
options {
backtrack=true;
output=AST;
}
tokens {
MUL;
DIV;
}
parse
: expr* EOF
;
expr
: (atom -> atom)
( '/' a=atom -> ^(DIV $expr $a)
| a=atom -> ^(MUL $expr $a)
)*
;
atom
: Number
| '(' expr ')' -> expr
;
Number
: '0'..'9'+
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String[] args) throws Exception {
String source = "1 2 / 3 4";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.parse_return result = parser.parse();
Tree tree = (Tree)result.getTree();
System.out.println(tree.toStringTree());
}
}
produced:
(MUL (DIV (MUL 1 2) 3) 4)