ANTLR - test class not found error when debugging - antlr

I am getting error of Test class not found error, even when I have made it via command
java org.antlr.Tool something.g
when debugging via ANTLRworks. I have been searching this on the web, but no success with it. Do you know, how to solve my problem?
Thanks
Edit - grammar:
grammar Expr;
prog : stat+ ;
stat : expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
expr : multExpr (('+' |'-' ) multExpr)*;
multExpr: atom ('*' atom)*
;
atom : INT
| ID
| '(' expr ')'
;
ID : ('a'..'z' |'A'..'Z' )+ ;
INT : '0'..'9' + ;
NEWLINE:'\r' ? '\n' ;
WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;
Error message:
When debugging with imput text 3*4 makes output - "Error: Could not find or load main class Test" While, Test is generated in subfolder /output...

Since you commented in the comments that running a test class from the console also failed, here's one that works:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ExprLexer lex = new ExprLexer(new ANTLRStringStream("1 + 2 - 3 * 4 - E\n"));
CommonTokenStream tokens = new CommonTokenStream(lex);
ExprParser parser = new ExprParser(tokens);
parser.prog();
}
}
Now put your ANTLR JAR (v3.3 in my example) in the same directory as Main.java and Expr.g and run the main class from your console like this:
java -cp antlr-3.3.jar org.antlr.Tool Expr.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
The fact that nothing is printed to your console means that the parsing went without any problems.

Related

antlr3ide generates parsers and lexers without package info?

antlr3ide seems to generate parser and lexer files without the package info where the java files are located (such as package tour.trees;, here the relative path folder tour/trees contains the corresponding files ExprParser.java and ExprLexer.java).
The official forum seems a bit inactive and the documentation gives me not much help:(
Below is a sample grammar file Expr.g:
grammar Expr;
options {
language = Java;
}
prog : stat+;
stat : expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
expr: multiExpr (('+'|'-') multiExpr)*
;
multiExpr : atom('*' atom)*
;
atom : INT
| ID
| '(' expr ')'
;
ID : ('a'..'z'|'A'..'Z')+ ;
INT : '0'..'9'+;
NEWLINE : '\r'?'\n';
WS : (' '|'\t'|'\n'|'\r')+{skip();};
The package declaration is not something that antlrv3ide generates. This is done by ANTLR. To let ANTLR generate source files in the package tour.trees, add #header blocks containing the package declarations in your grammar file like this:
grammar Expr;
options {
language = Java;
}
// placed _after_ the `options`-block!
#parser::header { package tour.trees; }
#lexer::header { package tour.trees; }
prog : stat+;
...

Parsing a templating language

I'm trying to parse a templating language and I'm having trouble correctly parsing the arbitrary html that can appear between tags. So far what I have is below, any suggestions? An example of a valid input would be
{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}This Should Be Parsed as a Buffer.{/bar2}
And the grammar is:
grammar g;
options {
language=Java;
output=AST;
ASTLabelType=CommonTree;
}
/* LEXER RULES */
tokens {
}
LD : '{';
RD : '}';
LOOP : '#';
END_LOOP: '/';
PARTIAL : '>';
fragment DIGIT : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*;
BUFFER options {greedy=false;} : ~(LD | RD)+ ;
/* PARSER RULES */
start : body EOF
;
body : (tag | loop | partial | BUFFER)*
;
tag : LD! IDENT^ RD!
;
loop : LD! LOOP^ IDENT RD!
body
LD! END_LOOP! IDENT RD!
;
partial : LD! PARTIAL^ IDENT RD!
;
buffer : BUFFER
;
Your lexer tokenizes independently from your parser. If your parser tries to match a BUFFER token, the lexer does not take this info into account. In your case with input like: "blah blah blah", the lexer creates 3 IDENT tokens, not a single BUFFER token.
What you need to "tell" your lexer is that when you're inside a tag (i.e. you encountered a LD tag), a IDENT token should be created, and when you're outside a tag (i.e. you encountered a RD tag), a BUFFER token should be created instead of an IDENT token.
In order to implement this, you need to:
create a boolean flag inside the lexer that keeps track of the fact that you're in- or outside a tag. This can be done inside the #lexer::members { ... } section of your grammar;
after the lexer either creates a LD- or RD-token, flip the boolean flag from (1). This can be done in the #after{ ... } section of the lexer rules;
before creating a BUFFER token inside the lexer, check if you're outside a tag at the moment. This can be done by using a semantic predicate at the start of your lexer rule.
A short demo:
grammar g;
options {
output=AST;
ASTLabelType=CommonTree;
}
#lexer::members {
private boolean insideTag = false;
}
start
: body EOF -> body
;
body
: (tag | loop | partial | BUFFER)*
;
tag
: LD IDENT RD -> IDENT
;
loop
: LD LOOP IDENT RD body LD END_LOOP IDENT RD -> ^(LOOP body IDENT IDENT)
;
partial
: LD PARTIAL IDENT RD -> ^(PARTIAL IDENT)
;
LD #after{insideTag=true;} : '{';
RD #after{insideTag=false;} : '}';
LOOP : '#';
END_LOOP : '/';
PARTIAL : '>';
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*;
BUFFER : {!insideTag}?=> ~(LD | RD)+;
fragment DIGIT : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
(note that you probably want to discard spaces between tag, so I added a SPACE rule and discarded these spaces)
Test it with the following class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}" +
"This Should Be Parsed as a Buffer.{/bar2}";
gLexer lexer = new gLexer(new ANTLRStringStream(src));
gParser parser = new gParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
and after running the main class:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool g.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Windows
java -cp antlr-3.3.jar org.antlr.Tool g.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
You'll see some DOT-source being printed to the console, which corresponds to the following AST:
(image created using graphviz-dev.appspot.com)

In ANTLR, how do you specify a specific number of repetitions?

I'm using ANTLR to specify a file format that contains lines that cannot exceed 254 characters (excluding line endings). How do I encode this in the grammer, short of doing:
line : CHAR? CHAR? CHAR? CHAR? ... (254 times)
This can be handled by using a semantic predicate.
First write your grammar in such a way that it does not matter how long your lines are. An example would look like this:
grammar Test;
parse
: line* EOF
;
line
: Char+ (LineBreak | EOF)
| LineBreak // empty line!
;
LineBreak : '\r'? '\n' | '\r' ;
Char : ~('\r' | '\n') ;
and then add the "predicate" to the line rule:
grammar Test;
#parser::members {
public static void main(String[] args) throws Exception {
String source = "abcde\nfghij\nklm\nnopqrst";
ANTLRStringStream in = new ANTLRStringStream(source);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.parse();
}
}
parse
: line* EOF
;
line
: (c+=Char)+ {$c.size()<=5}? (LineBreak | EOF)
| LineBreak // empty line!
;
LineBreak : '\r'? '\n' | '\r' ;
Char : ~('\r' | '\n') ;
The c+=Char will construct an ArrayList containing all characters in the line. The {$c.size()<=5}? causes to throw an exception when the ArrayList's size exceeds 5.
I also added a main method in the parser so you can test it yourself:
// *nix/MacOSX
java -cp antlr-3.2.jar org.antlr.Tool Test.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar TestParser
// Windows
java -cp antlr-3.2.jar org.antlr.Tool Test.g
javac -cp antlr-3.2.jar *.java
java -cp .;antlr-3.2.jar TestParser
which will output:
line 0:-1 rule line failed predicate: {$c.size()<=5}?
HTH

ANTLR grammar from bison

I'm trying to translate a grammar from bison to ANTLR. The grammar itself is pretty simple in bison but I cannot find a simple way for doing this.
Grammar in bison:
expr = expr or expr | expr and expr | (expr)
Any hints/links/pointers are welcome.
Thanks,
Iulian
In ANTLR3, you cannot create left recursive rules (ANTLR4 can handle left recursion in certain cases):
a : a b
;
tail recursion is fine:
a : b a
;
For more information on left recursive rules, see ANTLR's Wiki.
So, your example could look like:
parse
: expr+ EOF
;
expr
: orExpr
;
orExpr
: andExpr ('or' andExpr)*
;
andExpr
: atom ('and' atom)*
;
atom
: Boolean
| '(' expr ')'
;
Boolean
: 'true'
| 'false'
;
Here's a small demo in Java:
grammar BoolExp;
#members {
public static void main(String[] args) throws Exception {
if(args.length != 1) {
System.out.println("Usage:");
System.out.println(" - Windows : java -cp .;antlr-3.2.jar BoolExpParser \"EXPRESSION\"");
System.out.println(" - *nix/MacOS : java -cp .:antlr-3.2.jar BoolExpParser \"EXPRESSION\"");
System.exit(0);
}
ANTLRStringStream in = new ANTLRStringStream(args[0]);
BoolExpLexer lexer = new BoolExpLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
BoolExpParser parser = new BoolExpParser(tokens);
parser.parse();
}
}
parse
: e=expr EOF {System.out.println($e.bool);}
;
expr returns [boolean bool]
: e=orExpr {$bool = $e.bool;}
;
orExpr returns [boolean bool]
: e1=andExpr {$bool = $e1.bool;}
('or' e2=andExpr {$bool = $bool || $e2.bool;}
)*
;
andExpr returns [boolean bool]
: e1=atom {$bool = $e1.bool;}
('and' e2=atom {$bool = $bool && $e2.bool;}
)*
;
atom returns [boolean bool]
: b=Boolean {$bool = new Boolean($b.text).booleanValue();}
| '(' e=expr ')' {$bool = $e.bool;}
;
Boolean
: 'true'
| 'false'
;
Space
: (' ' | '\t' | '\n' | '\r') {skip();}
;
First create a lexer & parser (1) and then compile all source files (2). Finally, execute the BoolExpParser class (3).
1
// Windows & *nix/MacOS
java -cp antlr-3.2.jar org.antlr.Tool BoolExp.g
2
// Windows
javac -cp .;antlr-3.2.jar *.java
// *nix/MacOS
javac -cp .:antlr-3.2.jar *.java
3
// Windows
java -cp .;antlr-3.2.jar BoolExpParser "false and true or true"
// *nix/MacOS
java -cp .:antlr-3.2.jar BoolExpParser "false and true or true"
Terence Parr's ANTLR reference is the book on ANTLR. And Scott created some excellent video tutorials on ANTLR 3 (with Eclipse).

Using antlr to parse a | separated file

So I think this should be easy, but I'm having a tough time with it. I'm trying to parse a | delimited file, and any line that doesn't start with a | is a comment. I guess I don't understand how comments work. It always errors out on a comment line. This is a legacy file, so there's no changing it. Here's my grammar.
grammar Route;
#header {
package org.benheath.codegeneration;
}
#lexer::header {
package org.benheath.codegeneration;
}
file: line+;
line: route+ '\n';
route: ('|' elt) {System.out.println("element: [" + $elt.text + "]");} ;
elt: (ELEMENT)*;
COMMENT: ~'|' .* '\n' ;
ELEMENT: ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'_'|'#'|'#') ;
WS: (' '|'\t') {$channel=HIDDEN;} ; // ignore whitespace
Data:
! a comment
Another comment
| a | abc | b | def | ...
A grammar for that would look like this:
parse
: line* EOF
;
line
: ( comment | values ) ( NL | EOF )
;
comment
: ELEMENT+
;
values
: PIPE ( ELEMENT PIPE )+
;
PIPE
: '|'
;
ELEMENT
: ('a'..'z')+
;
NL
: '\r'? '\n' | '\r'
;
WS
: (' '|'\t') {$channel=HIDDEN;}
;
And to test it, you just need to sprinkle a bit of code in your grammar like this:
grammar Route;
#members {
List<List<String>> values = new ArrayList<List<String>>();
}
parse
: line* EOF
;
line
: ( comment | v=values {values.add($v.line);} ) ( NL | EOF )
;
comment
: ELEMENT+
;
values returns [List<String> line]
#init {line = new ArrayList<String>();}
: PIPE ( e=ELEMENT {line.add($e.text);} PIPE )*
;
PIPE
: '|'
;
ELEMENT
: ('a'..'z')+
;
NL
: '\r'? '\n' | '\r'
;
WS
: (' '|'\t') {$channel=HIDDEN;}
;
Now generate a lexer/parser by invoking:
java -cp antlr-3.2.jar org.antlr.Tool Route.g
create a class RouteTest.java:
import org.antlr.runtime.*;
import java.util.List;
public class RouteTest {
public static void main(String[] args) throws Exception {
String data =
"a comment\n"+
"| xxxxx | y | zzz |\n"+
"another comment\n"+
"| a | abc | b | def |";
ANTLRStringStream in = new ANTLRStringStream(data);
RouteLexer lexer = new RouteLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
RouteParser parser = new RouteParser(tokens);
parser.parse();
for(List<String> line : parser.values) {
System.out.println(line);
}
}
}
Compile all source files:
javac -cp antlr-3.2.jar *.java
and run the class RouteTest:
// Windows
java -cp .;antlr-3.2.jar RouteTest
// *nix/MacOS
java -cp .:antlr-3.2.jar RouteTest
If all goes well, you see this printed to your console:
[xxxxx, y, zzz]
[a, abc, b, def]
Edit: note that I simplified it a bit by only allowing lower case letters, you can always expand the set of course.
It's a nice idea to use ANTLR for a job like this, although I do think it's overkill. For example, it would be very easy to (in pseudo-code):
for each line in file:
if line begins with '|':
fields = /|\s*([a-z]+)\s*/g
Edit: Well, you can't express the distinction between comments and lines lexically, because there is nothing lexical that distinguishes them. A hint to get you in one workable direction.
line: comment | fields;
comment: NONBAR+ (BAR|NONBAR+) '\n';
fields = (BAR NONBAR)+;
This seems to work, I swear I tried it. Changing comment to lower case switched it to the parser vs the lexer, I still don't get it.
grammar Route;
#header {
package org.benheath.codegeneration;
}
#lexer::header {
package org.benheath.codegeneration;
}
file: (line|comment)+;
line: route+ '\n';
route: ('|' elt) {System.out.println("element: [" + $elt.text + "]");} ;
elt: (ELEMENT)*;
comment : ~'|' .* '\n';
ELEMENT: ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'_'|'#'|'#') ;
WS: (' '|'\t') {$channel=HIDDEN;} ; // ignore whitespace