ANTLR grammar from bison - antlr

I'm trying to translate a grammar from bison to ANTLR. The grammar itself is pretty simple in bison but I cannot find a simple way for doing this.
Grammar in bison:
expr = expr or expr | expr and expr | (expr)
Any hints/links/pointers are welcome.
Thanks,
Iulian

In ANTLR3, you cannot create left recursive rules (ANTLR4 can handle left recursion in certain cases):
a : a b
;
tail recursion is fine:
a : b a
;
For more information on left recursive rules, see ANTLR's Wiki.
So, your example could look like:
parse
: expr+ EOF
;
expr
: orExpr
;
orExpr
: andExpr ('or' andExpr)*
;
andExpr
: atom ('and' atom)*
;
atom
: Boolean
| '(' expr ')'
;
Boolean
: 'true'
| 'false'
;
Here's a small demo in Java:
grammar BoolExp;
#members {
public static void main(String[] args) throws Exception {
if(args.length != 1) {
System.out.println("Usage:");
System.out.println(" - Windows : java -cp .;antlr-3.2.jar BoolExpParser \"EXPRESSION\"");
System.out.println(" - *nix/MacOS : java -cp .:antlr-3.2.jar BoolExpParser \"EXPRESSION\"");
System.exit(0);
}
ANTLRStringStream in = new ANTLRStringStream(args[0]);
BoolExpLexer lexer = new BoolExpLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
BoolExpParser parser = new BoolExpParser(tokens);
parser.parse();
}
}
parse
: e=expr EOF {System.out.println($e.bool);}
;
expr returns [boolean bool]
: e=orExpr {$bool = $e.bool;}
;
orExpr returns [boolean bool]
: e1=andExpr {$bool = $e1.bool;}
('or' e2=andExpr {$bool = $bool || $e2.bool;}
)*
;
andExpr returns [boolean bool]
: e1=atom {$bool = $e1.bool;}
('and' e2=atom {$bool = $bool && $e2.bool;}
)*
;
atom returns [boolean bool]
: b=Boolean {$bool = new Boolean($b.text).booleanValue();}
| '(' e=expr ')' {$bool = $e.bool;}
;
Boolean
: 'true'
| 'false'
;
Space
: (' ' | '\t' | '\n' | '\r') {skip();}
;
First create a lexer & parser (1) and then compile all source files (2). Finally, execute the BoolExpParser class (3).
1
// Windows & *nix/MacOS
java -cp antlr-3.2.jar org.antlr.Tool BoolExp.g
2
// Windows
javac -cp .;antlr-3.2.jar *.java
// *nix/MacOS
javac -cp .:antlr-3.2.jar *.java
3
// Windows
java -cp .;antlr-3.2.jar BoolExpParser "false and true or true"
// *nix/MacOS
java -cp .:antlr-3.2.jar BoolExpParser "false and true or true"
Terence Parr's ANTLR reference is the book on ANTLR. And Scott created some excellent video tutorials on ANTLR 3 (with Eclipse).

Related

How do I convert this Antlr3 AST to Antlr4?

I'm trying to convert my existing Antlr3 project to Antlr4 to get more functionality. I have this grammar that wouldn't compile with Antlr4.9
expr
: term ( OR^ term )* ;
and
factor
: ava | NOT^ factor | (LPAREN! expr RPAREN!) ;
Mostly because Antlr4 doesn't support ^ and ! anymore. From the documentation it seems like those are
AST root operator. When generating abstract syntax trees (ASTs), token
references suffixed with the "^" root operator force AST nodes to be
created and added as the root of the current tree. This symbol is only
effective when the buildAST option is set. More information about ASTs
is also available.
AST exclude operator. When generating abstract syntax trees, token
references suffixed with the "!" exclude operator are not included in
the AST constructed for that rule. Rule references can also be
suffixed with the exclude operator, which implies that, while the tree
for the referenced rule is constructed, it is not linked into the tree
for the referencing rule. This symbol is only effective when the
buildAST option is set. More information about ASTs is also available.
If I took those out it would compile but I'm not sure what do those mean and how would Antlr4 supports it.
LPAREN and RPAREN is tokens
tokens {
EQUALS = '=';
LPAREN = '(';
RPAREN = ')';
}
which Antlr4 kindly provides the way to convert that in the error messages but not ^ and !. The grammar is for parsing boolean expression for example (a=b AND b=c)
I think this is the rule
targetingexpr returns [boolean value]
: expr { $value = $expr.value; } ;
expr returns [boolean value]
: ^(NOT a=expr) { $value = !a; }
| ^(AND a=expr b=expr) { $value = a && b; }
| ^(OR a=expr b=expr) { $value = a || b; }
| ^(EQUALS A=ALPHANUM B=ALPHANUM) { $value = targetingContext.contains($A.text,$B.text); }
;
The v3 grammar:
...
tokens {
EQUALS = '=';
LPAREN = '(';
RPAREN = ')';
}
...
expr
: term ( OR^ term )* ;
factor
: ava | NOT^ factor | (LPAREN! expr RPAREN!) ;
in v4 would look like this:
...
expr
: term ( OR term )* ;
factor
: ava | NOT factor | (LPAREN expr RPAREN) ;
EQUALS : '=';
LPAREN : '(';
RPAREN : ')';
So, just remove the inline ^ and ! operators (tree rewriting is no longer available in ANTLR4), and move the literal tokens in the tokens { ... } sections into own lexer rules.
I think this is the rule
targetingexpr returns [boolean value]
: expr { $value = $expr.value; } ;
expr returns [boolean value]
: ^(NOT a=expr) { $value = !a; }
| ^(AND a=expr b=expr) { $value = a && b; }
| ^(OR a=expr b=expr) { $value = a || b; }
| ^(EQUALS A=ALPHANUM B=ALPHANUM) { $value = targetingContext.contains($A.text,$B.text); }
;
What you posted there is part of a tree grammar for which there is no equivalent. In ANTLR4 you'd use a visitor to evaluate your expressions instead of inside a tree grammar.

Antlr superfluous Predicate required?

I have a file where I want to ignore parts of it. In the Lexer I use gated semantic predicates to avoid creating tokens for the uninteresting part of the file. My rules are similar to the following.
A
: {!ignore}?=> 'A'
;
START_IGNORE
: 'foo' {ignore = true; skip();}
;
END_IGNORE
: 'oof' {ignore = false; skip();}
;
IGNORE
: {ignore}?=> . {skip();}
;
However unless I change START and END to also use semantic predicates (as below) it does not work..
A
: {!ignore}?=> 'A'
;
START_IGNORE
: {true}?=> 'foo' {ignore = true; skip();}
;
END_IGNORE
: {true}?=> 'oof' {ignore = false; skip();}
;
IGNORE
: {ignore}?=> . {skip();}
;
Why do I have to add the predicates?
EDIT: I am using antlr-3.4
Why do I have to add the predicates?
You don't. At least, not using ANTLR v3.3. I don't know how exactly you're testing, but don't use ANTLRWorks' interpreter or the Eclipse ANTLR IDE plugin. Always do a little test from the command line.
grammar T;
#parser::members {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("A foo A B C oof A"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
#lexer::members {
private boolean ignore = false;
}
parse
: (t=.
{System.out.printf("[\%02d] type=\%s text='\%s'\n", $t.getCharPositionInLine(), tokenNames[$t.type], $t.text);}
)* EOF
;
A
: {!ignore}?=> 'A'
;
START_IGNORE
: 'foo' {ignore = true; skip();}
;
END_IGNORE
: 'oof' {ignore = false; skip();}
;
IGNORE
: {ignore}?=> . {skip();}
;
SPACE
: ' ' {skip();}
;
Run it like this:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar TParser
which will print the following:
[00] type=A text='A'
[16] type=A text='A'
I.e.: from the input "A foo A B C oof A" the following: "foo A B C oof" is skipped.

ANTLR - test class not found error when debugging

I am getting error of Test class not found error, even when I have made it via command
java org.antlr.Tool something.g
when debugging via ANTLRworks. I have been searching this on the web, but no success with it. Do you know, how to solve my problem?
Thanks
Edit - grammar:
grammar Expr;
prog : stat+ ;
stat : expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
expr : multExpr (('+' |'-' ) multExpr)*;
multExpr: atom ('*' atom)*
;
atom : INT
| ID
| '(' expr ')'
;
ID : ('a'..'z' |'A'..'Z' )+ ;
INT : '0'..'9' + ;
NEWLINE:'\r' ? '\n' ;
WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;
Error message:
When debugging with imput text 3*4 makes output - "Error: Could not find or load main class Test" While, Test is generated in subfolder /output...
Since you commented in the comments that running a test class from the console also failed, here's one that works:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ExprLexer lex = new ExprLexer(new ANTLRStringStream("1 + 2 - 3 * 4 - E\n"));
CommonTokenStream tokens = new CommonTokenStream(lex);
ExprParser parser = new ExprParser(tokens);
parser.prog();
}
}
Now put your ANTLR JAR (v3.3 in my example) in the same directory as Main.java and Expr.g and run the main class from your console like this:
java -cp antlr-3.3.jar org.antlr.Tool Expr.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
The fact that nothing is printed to your console means that the parsing went without any problems.

Parsing a templating language

I'm trying to parse a templating language and I'm having trouble correctly parsing the arbitrary html that can appear between tags. So far what I have is below, any suggestions? An example of a valid input would be
{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}This Should Be Parsed as a Buffer.{/bar2}
And the grammar is:
grammar g;
options {
language=Java;
output=AST;
ASTLabelType=CommonTree;
}
/* LEXER RULES */
tokens {
}
LD : '{';
RD : '}';
LOOP : '#';
END_LOOP: '/';
PARTIAL : '>';
fragment DIGIT : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*;
BUFFER options {greedy=false;} : ~(LD | RD)+ ;
/* PARSER RULES */
start : body EOF
;
body : (tag | loop | partial | BUFFER)*
;
tag : LD! IDENT^ RD!
;
loop : LD! LOOP^ IDENT RD!
body
LD! END_LOOP! IDENT RD!
;
partial : LD! PARTIAL^ IDENT RD!
;
buffer : BUFFER
;
Your lexer tokenizes independently from your parser. If your parser tries to match a BUFFER token, the lexer does not take this info into account. In your case with input like: "blah blah blah", the lexer creates 3 IDENT tokens, not a single BUFFER token.
What you need to "tell" your lexer is that when you're inside a tag (i.e. you encountered a LD tag), a IDENT token should be created, and when you're outside a tag (i.e. you encountered a RD tag), a BUFFER token should be created instead of an IDENT token.
In order to implement this, you need to:
create a boolean flag inside the lexer that keeps track of the fact that you're in- or outside a tag. This can be done inside the #lexer::members { ... } section of your grammar;
after the lexer either creates a LD- or RD-token, flip the boolean flag from (1). This can be done in the #after{ ... } section of the lexer rules;
before creating a BUFFER token inside the lexer, check if you're outside a tag at the moment. This can be done by using a semantic predicate at the start of your lexer rule.
A short demo:
grammar g;
options {
output=AST;
ASTLabelType=CommonTree;
}
#lexer::members {
private boolean insideTag = false;
}
start
: body EOF -> body
;
body
: (tag | loop | partial | BUFFER)*
;
tag
: LD IDENT RD -> IDENT
;
loop
: LD LOOP IDENT RD body LD END_LOOP IDENT RD -> ^(LOOP body IDENT IDENT)
;
partial
: LD PARTIAL IDENT RD -> ^(PARTIAL IDENT)
;
LD #after{insideTag=true;} : '{';
RD #after{insideTag=false;} : '}';
LOOP : '#';
END_LOOP : '/';
PARTIAL : '>';
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*;
BUFFER : {!insideTag}?=> ~(LD | RD)+;
fragment DIGIT : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
(note that you probably want to discard spaces between tag, so I added a SPACE rule and discarded these spaces)
Test it with the following class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}" +
"This Should Be Parsed as a Buffer.{/bar2}";
gLexer lexer = new gLexer(new ANTLRStringStream(src));
gParser parser = new gParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
and after running the main class:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool g.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Windows
java -cp antlr-3.3.jar org.antlr.Tool g.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
You'll see some DOT-source being printed to the console, which corresponds to the following AST:
(image created using graphviz-dev.appspot.com)

In ANTLR, how do you specify a specific number of repetitions?

I'm using ANTLR to specify a file format that contains lines that cannot exceed 254 characters (excluding line endings). How do I encode this in the grammer, short of doing:
line : CHAR? CHAR? CHAR? CHAR? ... (254 times)
This can be handled by using a semantic predicate.
First write your grammar in such a way that it does not matter how long your lines are. An example would look like this:
grammar Test;
parse
: line* EOF
;
line
: Char+ (LineBreak | EOF)
| LineBreak // empty line!
;
LineBreak : '\r'? '\n' | '\r' ;
Char : ~('\r' | '\n') ;
and then add the "predicate" to the line rule:
grammar Test;
#parser::members {
public static void main(String[] args) throws Exception {
String source = "abcde\nfghij\nklm\nnopqrst";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.parse();
}
}
parse
: line* EOF
;
line
: (c+=Char)+ {$c.size()<=5}? (LineBreak | EOF)
| LineBreak // empty line!
;
LineBreak : '\r'? '\n' | '\r' ;
Char : ~('\r' | '\n') ;
The c+=Char will construct an ArrayList containing all characters in the line. The {$c.size()<=5}? causes to throw an exception when the ArrayList's size exceeds 5.
I also added a main method in the parser so you can test it yourself:
// *nix/MacOSX
java -cp antlr-3.2.jar org.antlr.Tool Test.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar TestParser
// Windows
java -cp antlr-3.2.jar org.antlr.Tool Test.g
javac -cp antlr-3.2.jar *.java
java -cp .;antlr-3.2.jar TestParser
which will output:
line 0:-1 rule line failed predicate: {$c.size()<=5}?
HTH