I am trying to create my first ANTLR3 tree grammar, but I keep hitting the same problem. The output of the parser is:
$ echo 'foo, bar' | ./run.sh
foo bar
TreeGrammar.g: node from line 0:0 required (...)+ loop did not match anything at input 'EOF'
Exception in thread "main" java.lang.NullPointerException
at Driver.main(Driver.java:29)
The output clearly shows that the stage-1 parser results in the right tokens ('foo' and 'bar'). Somehow the stage-2 tree-parser refuses to parse the results from stage-1. Since the code is very basic, it must be some simple, dumb oversight at my part ;-)
Here's my simple test code:
Grammar.g:
grammar Grammar;
options {
output = AST;
}
statement: word (','! word)* EOF!;
word: ID;
ID: ('a'..'z'|'A'..'Z')+;
WS: (' ' | '\t' | '\n' | '\r')+ { $channel = HIDDEN; } ;
TreeGrammar.g:
tree grammar TreeGrammar;
options {
tokenVocab = Grammar;
ASTLabelType = CommonTree;
output = template;
}
statement: word+;
word: ID;
Driver.java:
import java.io.*;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Driver {
public static void main(String[] args) throws Exception {
FileReader groupFileR = new FileReader("Template.stg" );
StringTemplateGroup templates = new StringTemplateGroup(groupFileR);
groupFileR.close();
ANTLRInputStream input = new ANTLRInputStream(System.in);
GrammarLexer lexer = new GrammarLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
GrammarParser parser = new GrammarParser(tokens);
GrammarParser.statement_return result = parser.statement();
CommonTree t = (CommonTree)result.getTree();
System.out.println(t.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
nodes.setTokenStream(tokens);
TreeGrammar walker = new TreeGrammar(nodes);
walker.setTemplateLib(templates);
walker.statement();
TreeGrammar.statement_return r2 = walker.statement();
StringTemplate output = (StringTemplate) r2.getTemplate();
System.out.println(output.toString());
}
}
Assuming your Stringtemplate groups is properly formed, your problem is most probably the fatc you walk your AST twice:
walker.statement();
TreeGrammar.statement_return r2 = walker.statement();
E.g., you call walker.statement() twice. This is what the (first) error is telling you:
TreeGrammar.g: node from line 0:0 required (...)+ loop did not match anything at input 'EOF'
You consume the input once with walker.statement() resulting the node stream is at the end (EOF), and then you call walker.statement() again and it expects tow walk word+ again, yet there's only a EOF left.
Related
The input file that I want to parse consists of one or more rows and each row consists of a greeting (Hello or Greeting) followed by a person's first name. Here is a sample input:
Hello Roger
Greeting Sally
I want to create a parser that outputs XML. For the sample input I want the parser to generate this XML:
<messages>
<message>
<greeting>Hello</greeting>
<person>Roger</person>
</message>
<message>
<greeting>Greeting</greeting>
<person>Sally</person>
</message>
</messages>
I want the XML generated directly within the parser file (MyParser.g4) using Java System.out.println
Here is my lexer:
lexer grammar MyLexer;
GREETING : ('Hello' | 'Greeting') ;
ID : [a-zA-Z]+ ;
WS : [ \t]+ -> skip ;
EOL : [\n] ;
Here is my parser:
parser grammar MyParser;
options { tokenVocab=MyLexer; }
document: (message+ {System.out.println("<messages>" + $message.value + "</messages>");});
message returns [String value]: (GREETING ID {value = "<message><greeting>" + $GREETING.text + "</greeting><name>" + $ID.text + "</name></message>";}) ;
I ran ANTLR on the lexer and parser and then compiled the Java code that ANTLR generated. This resulted in the following error message.
MyParser.java:154: error: cannot find symbol
value = "<message><greeting>" + (((MessageContext)_localctx).GREETING!=null?((MessageContext)_localctx).GREETING.getText():null) + "</greeting><name>" + (((MessageContext)_localctx).ID!=null?((MessageContext)_localctx).ID.getText():null) + "</name></message>";
^
symbol: variable value
location: class MyParser
What am I doing wrong, please?
You forgot the $ before value, it must be: $value = "<message><greeting>" + ….
And you also want to print every message, so not:
message+ {System.out.println( … );}
which will print just once, but like this instead:
(message {System.out.println( … );})+
This ought to do it:
parser grammar MyParser;
options { tokenVocab=MyLexer; }
document
: {System.out.println("<messages>");}
( message EOL? {
System.out.println(" " + $message.value);
})+
{System.out.println("</messages>");}
EOF
;
message returns [String value]
: GREETING ID {
$value = "<message><greeting>" + $GREETING.text + "</greeting><name>" + $ID.text + "</name></message>";
}
;
Can be tested like this:
String source = "Hello Roger\nGreeting Sally";
MyLexer lexer = new MyLexer(CharStreams.fromString(source));
MyParser parser = new MyParser(new CommonTokenStream(lexer));
parser.document();
I am using the CSharp Java target - i am parsing some Csharp code like this:
List<Token> codeTokens = new ArrayList<Token>();
List<Token> commentTokens = new ArrayList<Token>();
//CharStream cs = CharStreams.fromString(contents);
CharStream cs = CharStreams.fromPath(path);
CSharpLexer lexer = new CSharpLexer(cs);
// recognition error happens here:
List<? extends Token> tokens = lexer.getAllTokens();
List<Token> directiveTokens = new ArrayList<Token>();
ListTokenSource directiveTokenSource = new ListTokenSource(directiveTokens);
CommonTokenStream directiveTokenStream = new CommonTokenStream(directiveTokenSource, CSharpLexer.DIRECTIVE);
CSharpPreprocessorParser preprocessorParser = new CSharpPreprocessorParser(directiveTokenStream);
If my source code is ASCII encoded, it works fine. But if it's UNICODE, even if there's nothing in the file, I always get this error:
line 1:0 token recognition error at: ''
Do I need to configure my Lexer differently? The error comes from Lexer.java => getAllTokens() => nextToken() => getInterpreter().match(_input, _mode);
Again, I get this even with an empty UNICODE-encoded file - but it still contains the U+FEFF character:
$ less ApiUserInfo.cs
<U+FEFF>
ApiUserInfo.cs (END)
Thank you
Angel
Can someone please post an example for using union operator (|) with VTD XML parser ?
Below is not working in VTD XML parser but works in jxpath parser.
/a | /b
Ok, this is a quick example of union expression in action.
import com.ximpleware.AutoPilot;
import com.ximpleware.NavException;
import com.ximpleware.VTDException;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
/**
* An issue that seems to exist in VTD-XML 2.12 and 2.13 (but not 2.11) that causes lookups for default namespace nodes
* to also pickup non-default namespaced nodes.
*/
public class VtdAutoPilotXpathIssueTest {
private static final String XML = "<a xmlns:x=\"" + "urn:test" + "\"><b id=\"1\"/><x:b id=\"2\"/><b id=\"3\"/></a>";
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
vg.setDoc(XML.getBytes());
vg.parse(false);
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.declareXPathNameSpace("abc", "urn:test");
ap.selectXPath("//b|/a");
int i=0;
while((i=ap.evalXPath())!=-1){
System.out.println(" token "+ vn.toRawString(i));
}
}
}
I'm having trouble figuring out the antlr3 API so I can generate and use a parse tree in some javascript code. When I open the grammar file using antlrWorks (their IDE), the interpreter is able to show me the parse tree, and it's even correct.
I'm having a lot of difficulties tracking down resources on how to get this parse tree in my code using the antlr3 runtime. I've been messing around with the various functions in the runtime and Parser files but to no avail:
var input = "(PR=5000)",
cstream = new org.antlr.runtime.ANTLRStringStream(input),
lexer = new TLexer(cstream),
tstream = new org.antlr.runtime.CommonTokenStream(lexer),
parser = new TParser(tstream);
var tree = parser.query().tree;
var nodeStream = new org.antlr.runtime.tree.CommonTreeNodeStream(tree);
nodeStream.setTokenStream(tstream);
parseTree = new org.antlr.runtime.tree.TreeParser(nodeStream);
Since antlrWorks can display the parse tree without any tree grammar from myself, and since I have read that antlr automatically generates a parse tree from the grammar file, I'm assuming that I can access this basic parse tree with some runtime functions that I am probably not aware of. Am I correct in this thinking?
HugeAntlrs wrote:
Since antlrWorks can display the parse tree without any tree grammar from myself, and since I have read that antlr automatically generates a parse tree from the grammar file, I'm assuming that I can access this basic parse tree with some runtime functions that I am probably not aware of. Am I correct in this thinking?
No, that is incorrect. ANTLR creates a flat, 1 dimensional stream of tokens.
ANTLRWorks creates its own parse tree on the fly when interpreting some source. You have no access to this tree (not with Javascript or even with Java). You will have to define the tokens that you think should be the roots of your (sub) trees and/or define the tokens that need to be removed from your AST. Checkout the following Q&A that explains how to create a proper AST: How to output the AST built using ANTLR?
EDIT
Since there's no proper JavaScript demo on SO yet, here's a quick demo.
The following grammar parses boolean expression with the following operators:
or
and
is
not
where not has the highest precedence.
Of course there are true and false, and the expressions can be grouped using parenthesis.
file: Exp.g
grammar Exp;
options {
output=AST;
language=JavaScript;
}
parse
: exp EOF -> exp
;
exp
: orExp
;
orExp
: andExp (OR^ andExp)*
;
andExp
: eqExp (AND^ eqExp)*
;
eqExp
: unaryExp (IS^ unaryExp)*
;
unaryExp
: NOT atom -> ^(NOT atom)
| atom
;
atom
: TRUE
| FALSE
| '(' exp ')' -> exp
;
OR : 'or' ;
AND : 'and' ;
IS : 'is' ;
NOT : 'not' ;
TRUE : 'true' ;
FALSE : 'false' ;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;
The grammar above produces an AST which can be fed to the tree-walker below:
file: ExpWalker.g
tree grammar ExpWalker;
options {
tokenVocab=Exp;
ASTLabelType=CommonTree;
language=JavaScript;
}
// `walk` returns a string
walk returns [expr]
: exp {expr = ($exp.expr == 1) ? 'True' : 'False';}
;
// `exp` returns either 1 (true) or 0 (false)
exp returns [expr]
: ^(OR a=exp b=exp) {expr = ($a.expr == 1 || $b.expr == 1) ? 1 : 0;}
| ^(AND a=exp b=exp) {expr = ($a.expr == 1 && $b.expr == 1) ? 1 : 0;}
| ^(IS a=exp b=exp) {expr = ($a.expr == $b.expr) ? 1 : 0;}
| ^(NOT a=exp) {expr = ($a.expr == 1) ? 0 : 1;}
| TRUE {expr = 1;}
| FALSE {expr = 0;}
;
(apologies for the messy JavaScript code inside { ... }: I have very little experience with JavaScript!)
Now download ANTLR 3.3 (no earlier version!) and the JavaScript runtime files:
http://www.antlr.org/download/antlr-3.3-complete.jar
http://www.antlr.org/download/antlr-javascript-runtime-3.1.zip
Rename antlr-3.3-complete.jar to antlr-3.3.jar and unzip antlr-javascript-runtime-3.1.zip and store all files in the same folder as your Exp.g and ExpWalker.g files.
Now generate the lexer, parser and tree-walker:
java -cp antlr-3.3.jar org.antlr.Tool Exp.g
java -cp antlr-3.3.jar org.antlr.Tool ExpWalker.g
And test it all with the following html file:
<html>
<head>
<script type="text/javascript" src="antlr3-all-min.js"></script>
<script type="text/javascript" src="ExpLexer.js"></script>
<script type="text/javascript" src="ExpParser.js"></script>
<script type="text/javascript" src="ExpWalker.js"></script>
<script type="text/javascript">
function init() {
var evalButton = document.getElementById("eval");
evalButton.onclick = evalExpression;
}
function evalExpression() {
document.getElementById("answer").innerHTML = "";
var expression = document.getElementById("exp").value;
if(expression) {
var lexer = new ExpLexer(new org.antlr.runtime.ANTLRStringStream(expression));
var tokens = new org.antlr.runtime.CommonTokenStream(lexer);
var parser = new ExpParser(tokens);
var nodes = new org.antlr.runtime.tree.CommonTreeNodeStream(parser.parse().getTree());
nodes.setTokenStream(tokens);
var walker = new ExpWalker(nodes);
var value = walker.walk();
document.getElementById("answer").innerHTML = expression + " = " + value;
}
else {
document.getElementById("exp").value = "enter an expression here first";
}
}
</script>
</head>
<body onload="init()">
<input id="exp" type="text" size="35" />
<button id="eval">evaluate</button>
<div id="answer"></div>
</body>
</html>
And behold the result:
I'm just about starting with ANTLR and trying to parse some pattern out of a log file
for example: log file:
7114422 2009-07-16 15:43:07,078
[LOGTHREAD] INFO StatusLog - Task 0
input :
uk.project.Evaluation.Input.Function1(selected=["red","yellow"]){}
7114437 2009-07-16 15:43:07,093
[LOGTHREAD] INFO StatusLog - Task 0
output :
uk.org.project.Evaluation.Output.Function2(selected=["Rocket"]){}
7114422 2009-07-16 15:43:07,078
[LOGTHREAD] INFO StatusLog - Task 0
input :
uk.project.Evaluation.Input.Function3(selected=["blue","yellow"]){}
7114437 2009-07-16 15:43:07,093
[LOGTHREAD] INFO StatusLog - Task 0
output :
uk.org.project.Evaluation.Output.Function4(selected=["Speech"]){}
Now I have to parse this file to only find 'Evaluation.Input.Function1' and it's values 'red' and 'yellow' and 'Evaluation.Output.Function2' and values 'Rocket' and ignore everything else and similarly the other 2 input and output functions 3,4 below. There are many such Input and Output functions and I have to find such sets of input/output functions. This is my attempted grammar which is not working. Any help would be appreciated. Being my first attempt at writing grammar and ANTLR it is becoming quite daunting now..
grammar test;
tag : inputtag+ outputtag+ ;
//Input tag consists of atleast one inputfunction with one or more values
inputtag: INPUTFUNCTIONS INPUTVALUES+;
//output tag consists of atleast one ontput function with one or more output values
outputtag : OUTPUTFUNCTIONS OUTPUTVALUES+;
INPUTFUNCTIONS
: INFUNCTION1 | INFUNCTION2;
OUTPUTFUNCTIONS
:OUTFUNCTION1 | OUTFUNCTION2;
// Possible input functions in the log file
fragment INFUNCTION1
:'Evaluation.Input.Function1';
fragment INFUNCTION2
:'Evaluation.Input.Function3';
//Possible values in the input functions
INPUTVALUES
: 'red' | 'yellow' | 'blue';
// Possible output functions in the log file
fragment OUTFUNCTION1
:'Evaluation.Output.Function2';
fragment OUTFUNCTION2
:'Evaluation.Output.Function4';
//Possible ouput values in the output functions
fragment OUTPUTVALUES
: 'Rocket' | 'Speech';
When you're only interested in a part of the file you're parsing, you don't need a parser and write a grammar for the entire format of the file. Only a lexer-grammar and ANTLR's options{filter=true;} will suffice. That way, you will only grab the tokens you defined in your grammar and ignore the rest of the file.
Here's a quick demo:
lexer grammar TestLexer;
options{filter=true;}
#lexer::members {
public static void main(String[] args) throws Exception {
String text =
"7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=[\"red\",\"yellow\"]){}\n"+
"\n"+
"7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=[\"Rocket\"]){}\n"+
"\n"+
"7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=[\"blue\",\"yellow\"]){}\n"+
"\n"+
"7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=[\"Speech\"]){}";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
for(Object obj : tokens.getTokens()) {
Token token = (Token)obj;
System.out.println("> token.getText() = "+token.getText());
}
}
}
Input
: 'Evaluation.Input.Function' '0'..'9'+ Params
;
Output
: 'Evaluation.Output.Function' '0'..'9'+ Params
;
fragment
Params
: '(selected=[' String ( ',' String )* '])'
;
fragment
String
: '"' ( ~'"' )* '"'
;
Now do:
javac -cp antlr-3.2.jar TestLexer.java
java -cp .:antlr-3.2.jar TestLexer // or on Windows: java -cp .;antlr-3.2.jar TestLexer
and you'll see the following being printed to the console:
> token.getText() = Evaluation.Input.Function1(selected=["red","yellow"])
> token.getText() = Evaluation.Output.Function2(selected=["Rocket"])
> token.getText() = Evaluation.Input.Function3(selected=["blue","yellow"])
> token.getText() = Evaluation.Output.Function4(selected=["Speech"])