Example to parse expression: atom ((PLUS | MINUS) atom)* - antlr

Calculator.g4
grammar calculator;
expression: atom ((PLUS | MINUS) atom)*;
atom: '1';
PLUS: '+';
MINUS: '-';
WS: [ \r\n\t]+ -> skip;
Is there any examples to explain how to parse the pattern atom ((PLUS | MINUS) atom)*
I have searched a lot of blogs teaching how to parse the simple grammars in visitor and listener model. But, none of them show how to parse the pattern like atom ((PLUS | MINUS) atom)*.
What I am confused is the PLUS/MINUS are different, I can use method AllAtom to range over atoms, but there is no method to get the related PLUS/MINUS. There should be a list of PLUS/MINUS, which may be different from each other.

You can accomplish it by slightly rewriting the grammar. I extracted plus/minus into a separate rule.
grammar Calculator;
expression: atom (sign atom)*;
sign: PLUS | MINUS;
atom: '1';
PLUS: '+';
MINUS: '-';
WS: [ \r\n\t]+ -> skip;
And here would be the corresponding listener:
public class Calculator extends CalculatorBaseListener {
private int accum;
private char sign;
#Override
public void enterAtom(CalculatorParser.AtomContext ctx) {
int value = Integer.parseInt(ctx.getText());
switch (sign) {
case '\0':
accum = value;
break;
case '+':
accum += value;
break;
case '-':
accum -= value;
break;
}
}
#Override
public void enterSign(CalculatorParser.SignContext ctx) {
sign = ctx.getText().charAt(0);
}
#Override
public void exitExpression(CalculatorParser.ExpressionContext ctx) {
System.out.println(accum);
}
}

Related

How to write a lexer rule that references a character?

I want to create a lexer rule that can read a string literal that defines its own delimiter (specifically, the Oracle quote-delimited string):
q'!My string which can contain 'single quotes'!'
where the ! serves as the delimiter, but can in theory be any character.
Is it possible to do this via a lexer rule, without introducing a dependency on a given language target?
Is it possible to do this via a lexer rule, without introducing a dependency on a given language target?
No, target dependent code is needed for such a thing.
Just in case you, or someone else reading this Q&A is wondering how this can be done using target code, here's a quick demo:
lexer grammar TLexer;
#members {
boolean ahead(String text) {
for (int i = 0; i < text.length(); i++) {
if (_input.LA(i + 1) != text.charAt(i)) {
return false;
}
}
return true;
}
}
TEXT
: [nN]? ( ['] ( [']['] | ~['] )* [']
| [qQ] ['] QUOTED_TEXT [']
)
;
// Skip everything other than TEXT tokens
OTHER
: . -> skip
;
fragment QUOTED_TEXT
: '[' ( {!ahead("]'")}? . )* ']'
| '{' ( {!ahead("}'")}? . )* '}'
| '<' ( {!ahead(">'")}? . )* '>'
| '(' ( {!ahead(")'")}? . )* ')'
| . ( {!ahead(getText().charAt(0) + "'")}? . )* .
;
which can be tested with the class:
public class Main {
static void test(String input) {
TLexer lexer = new TLexer(new ANTLRInputStream(input));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
tokenStream.fill();
System.out.printf("input: `%s`\n", input);
for (Token token : tokenStream.getTokens()) {
if (token.getType() != TLexer.EOF) {
System.out.printf(" token: -> %s\n", token.getText());
}
}
System.out.println();
}
public static void main(String[] args) throws Exception {
test("foo q'!My string which can contain 'single quotes'!' bar");
test("foo q'(My string which can contain 'single quotes')' bar");
test("foo 'My string which can contain ''single quotes' bar");
}
}
which will print:
input: `foo q'!My string which can contain 'single quotes'!' bar`
token: -> q'!My string which can contain 'single quotes'!'
input: `foo q'(My string which can contain 'single quotes')' bar`
token: -> q'(My string which can contain 'single quotes')'
input: `foo 'My string which can contain ''single quotes' bar`
token: -> 'My string which can contain ''single quotes'
The . in the alternative
| . ( {!ahead(getText().charAt(0) + "'")}? . )* .
might be a bit too permissive, but that can be tweaked by replacing it with a negated, or regular character set.

Using a grammar with a visitor to calculate arithmetic expressions

We've been given a grammar in class that looks like this:
grammar Calculator;
#header {
import java.util.*;
}
#parser::members {
/** "memory" for our calculator; variable/value pairs go here */
Map<String, Double> memory = new HashMap<String, Double>();
}
statlist : stat+ ;
stat : vgl NL #printCompare
| ass NL #printAssign
| NL #blank
;
ass : <assoc=right> VAR ('=') vgl #assign
;
vgl : sum(op=('<'|'>') sum)* #compare
;
sum : prod(op=('+'|'-') prod)* #addSub
;
prod : pot(op=('*'|'/') pot)* #mulDiv
;
pot :<assoc=right> term(op='^' pot)? #poten
;
term : '+' term #add
| '-' term #subtract
| '(' sum ')' #parens
| VAR #var
| INT #int
;
/*Rules for the lexer */
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
BIG : '>' ;
SML : '<' ;
POT : '^' ;
VAR : [a-zA-Z]+ ;
NL : [\n] ;
INT : [0-9]+ ;
WS : [ \r\t]+ -> skip ; // skip spaces, tabs
I am having problems translating constructs like these
sum : prod(op=('+'|'-') prod)* #addSub
into working code. Currently the corresponding method looks like this:
/** prod(op=('+'|'-') prod)* */
#Override
public Double visitAddSub(CalculatorParser.AddSubContext ctx) {
double left = visit(ctx.prod(0));
if(ctx.op == null){
return left;
}
double right = visit(ctx.prod(1));
return (ctx.op.getType() == CalculatorParser.ADD) ? left+right : left-right;
}
Current output would look like this
3+3+3
6.0
which is obviously false. How do I get my visitor to visit the nodes correctly without touching the grammar?
Take a look at the rule:
prod(op=('+'|'-') prod)*
See that *? It means that what's inside the parentheses can come up 0 or more times.
Your visitor code assumes there will either be only one or two child prod, but no more. That's why you see 6.0: the parser put 3+3+3 into the context, but your visitor only processed 3+3 and leaved the final +3 out.
So just use a while loop over all the op and prod children, and accumulate them into the result.
Okay, with the help of Lucas and the usage of op+= I manage to fix my problem. It looks pretty complicated but it works.
/** prod(op+=('+'|'-') prod)* */
#Override
public Double visitAddSub(CalculatorParser.AddSubContext ctx) {
Stack<Double> temp = new Stack<Double>();
switch(ctx.children.size()){
case 1: return visit(ctx.prod(0));
default:
Double ret = 0.0;
for(int i = 0; i < ctx.op.size(); i++){
if(ctx.op.get(i).getType()==CalculatorParser.ADD){
if(temp.isEmpty()) {
ret = visit(ctx.prod(i)) + visit(ctx.prod(i+1));
temp.push(ret);
} else {
ret = temp.pop() + visit(ctx.prod(i+1));
temp.push(ret);
}
} else {
if(temp.isEmpty()) {
ret = visit(ctx.prod(i)) - visit(ctx.prod(i+1));
temp.push(ret);
} else {
ret = temp.pop() - visit(ctx.prod(i+1));
temp.push(ret);
}
}
}
}
return temp.pop();
}
We are using a switch-case to determine how many children this context has. If its more than 3 we have atleast 2 operators. We're then using the individual operator and a stack to determine the result.

ANTLR v3 Treewalker class. How to evaluate right associative function such as Factorial

I'm trying to build an expression evaluator with ANTLR v3 but I can't get the factorial function because it is right associative.
This is the code:
class ExpressionParser extends Parser;
options { buildAST=true; }
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS;
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : runary (POW^ runary)? ;
runary : unary (FAT)?;
unary : (SIN^|COS^|TAN^|LOG^|LN^|RAD^)* signExpr;
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
atom : NUMBER | expr ;
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SIN : 's' ;
COS : 'c' ;
TAN : 't' ;
LOG : 'l' ;
LN : 'n' ;
RAD : 'r' ;
FAT : 'f' ;
LPAREN: '(' ;
RPAREN: ')' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
NUMBER : (DIGIT)+ ('.' (DIGIT)+)?;
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; int i,f=1; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| #(SIN a=expr ) { r=Math.sin(a); }
| #(COS a=expr ) { r=Math.cos(a); }
| #(TAN a=expr ) { r=Math.tan(a); }
| #(LOG a=expr ) { r=Math.log10(a); }
| #(LN a=expr ) { r=Math.log(a); }
| #(RAD a=expr ) { r=Math.sqrt(a); }
| #(FAT a=expr ) { for(i=1; i<=a; i++){f=f*i;}; r=(double)f;}
| #(LPAREN a=expr) { r=a; }
| #(SIGN_MINUS a=expr) { r=-1*a; }
| #(SIGN_PLUS a=expr) { if(a<0)r=0-a; else r=a; }
| d:NUMBER { r=Double.parseDouble(d.getText()); } ;
if I change FAT matching case in class TreeWalker with something like this:
| #(a=expr FAT ) { for(i=1; i<=a; i++){f=f*i;}; r=(double)f;}
I get this errors:
Expression.g:56:7: rule classDef trapped:
Expression.g:56:7: unexpected token: a
error: aborting grammar 'ExpressionTreeWalker' due to errors
Exiting due to errors.
Your tree walker (the original one) is fine, as far as I can see.
However, you probably need to mark FAT in the grammar:
runary : unary (FAT^)?;
(Note the hat ^, as in all the other productions.)
Edit:
As explained in the Antlr3 wiki, the hat operator is needed to make the node the "root of subtree created for entire enclosing rule even if nested in a subrule". In this case, the ! operator is nested in a conditional subrule ((FAT)?). That's independent of whether the operator is prefix or postfix.
Note that in your grammar the ! operator is not right-associative since a!! is not valid at all. But I would say that associativity is only meaningful for infix operators.

ANTLR rewrite tree node as variable depth tree

I'm trying to do the following rewrite of the multiplication operator as repeated additions:
(* a t=INT) -> (+ a (+ a (+ a (+ ... + a) ... )) (t times)
Is there a way to do this in a single pass in ANTLR using a tree rewrite rule?
If not, what is the best way to go about it?
I have to do this rewriting multiple times, for each occurrence of '*', and the corresponding t's are parsed. Therefore, there is no fixed bound on the t's.
I managed to solve the problem in multiple passes. I compute the max number of passes while parsing the expression and apply the tree rewrite rules multiple times. I don't even need backtrack to be true. See code below.
Expr.g -> lexer, parser grammar
grammar Expr;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
MULT='*';
ADD='+';
}
#header{
import java.lang.Math;
}
#members {
public int limit=0;
}
prog : expr {limit=$expr.value;} ;
expr returns [int value]
: a=multExpr {$value=$a.value;} (ADD^ b=multExpr {$value=Math.max($value, $b.value);})* ;
multExpr returns [int value]
: primary {$value=$primary.value;} (MULT^ c=INT {$value=Math.max($value, $c.int);})? ;
primary returns[int value]
: ID {$value = 0;}
| '('! expr ')'! {$value = $expr.value;}
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
WS : (' '|'\r'|'\n')+ {skip();} ;
Eval.g -> tree rewrite grammar with main program
tree grammar Eval;
options {
tokenVocab=Expr;
ASTLabelType=CommonTree;
output=AST;
}
#members {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
ExprLexer lexer = new ExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExprParser parser = new ExprParser(tokens);
CommonTree t = null;
try {
t = (CommonTree) parser.prog().getTree();
} catch(RecognitionException re){
re.printStackTrace();
}
System.out.println("Tree: " + t.toStringTree());
System.out.println();
int loops = parser.limit;
System.out.println("Number of loops:" + loops);
System.out.println();
for(int i=0; i<loops; i++) {
System.out.println("Loop:" + (i+1));
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Eval s = new Eval(nodes);
t = (CommonTree)s.prog().getTree();
System.out.println("Simplified tree: "+t.toStringTree());
System.out.println();
}
}
}
prog : expr ;
expr
: ^(ADD a=expr b=expr)
| ^(MULT a=expr t=INT) ( {$t.int>1}?=> -> ^(ADD["+"] $a ^(MULT["*"] $a INT[String.valueOf($t.int - 1)]))
| {$t.int==1}?=> -> $a )
| INT
| ID
;

ANTLRWorks :Can't get operators to work

I've been trying to learn ANTLR for some time and finally got my hands on The Definitive ANTLR reference.
Well I tried the following in ANTLRWorks 1.4
grammar Test;
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
expression
: INT ('+'^ INT)*;
When I pass 2+4 and process expression, I don't get a tree with + as the root and 2 and 4 as the child nodes. Rather, I get expression as the root and 2, + and 4 as child nodes at the same level.
Can't figure out what I am doing wrong. Need help desparately.
BTW how can I get those graphic descriptions ?
Yes, you get the expression because it's an expression that your only rule expression is returning.
I have just added a virtual token PLUS to your example along with a rewrite expression that show the result your are expecting.
But it seems that you have already found the solution :o)
grammar Test;
options {
output=AST;
ASTLabelType = CommonTree;
}
tokens {PLUS;}
#members {
public static void main(String [] args) {
try {
TestLexer lexer =
new TestLexer(new ANTLRStringStream("2+2"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
TestParser.expression_return p_result = parser.expression();
CommonTree ast = p_result.tree;
if( ast == null ) {
System.out.println("resultant tree: is NULL");
} else {
System.out.println("resultant tree: " + ast.toStringTree());
}
} catch(Exception e) {
e.printStackTrace();
}
}
}
expression
: INT ('+' INT)* -> ^(PLUS INT+);
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;