SableCC not hitting interpreter methods - sablecc

I am new to SableCC. Just ran the calculator example at http://sablecc.sourceforge.net/thesis/thesis.html#PAGE26. I used the grammar file and interpreter file as they are, and tried to parse simple arithmetic expression like "45 * 5 + 2". The problem is, the interpreter method caseAMultFactor does not seem to be hit. I see it hit caseAPlusExpr, or caseAMinusExpr if I change the "+" to "-". So does the Start.apply(DepthFirstAdapter) method only go through the top mode node? How can I iterate through all nodes like that sample codes seem to do? I am using Java 1.7 and hope that's not a problem.
For your convenience I have pasted the grammar and interpreter codes here. Thanks for your help.
### Grammar:
Package postfix;
Tokens
number = ['0' .. '9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
mod = '%';
l_par = '(';
r_par = ')';
blank = (' ' | 13 | 10)+;
Ignored Tokens
blank;
Productions
expr =
{factor} factor |
{plus} expr plus factor |
{minus} expr minus factor;
factor =
{term} term |
{mult} factor mult term |
{div} factor div term |
{mod} factor mod term;
term =
{number} number |
{expr} l_par expr r_par;
### Interpreter:
package postfix.interpret;
import postfix.analysis.DepthFirstAdapter;
import postfix.node.ADivFactor;
import postfix.node.AMinusExpr;
import postfix.node.AModFactor;
import postfix.node.AMultFactor;
import postfix.node.APlusExpr;
import postfix.node.TNumber;
public class Interpreter extends DepthFirstAdapter
{
public void caseTNumber(TNumber node)
{// When we see a number, we print it.
System.out.print(node);
}
public void caseAPlusExpr(APlusExpr node)
{
System.out.println(node);
}
public void caseAMinusExpr(AMinusExpr node)
{
System.out.println(node);
}
public void caseAMultFactor(AMultFactor node)
{// out of alternative {mult} in Factor, we print the mult.
System.out.print(node.getMult());
}
public void outAMultFactor(AMultFactor node)
{// out of alternative {mult} in Factor, we print the mult.
System.out.print(node.getMult());
}
public void outADivFactor(ADivFactor node)
{// out of alternative {div} in Factor, we print the div.
System.out.print(node.getDiv());
}
public void outAModFactor(AModFactor node)
{// out of alternative {mod} in Factor, we print the mod.
System.out.print(node.getMod());
}
}

What you posted looks fine. You did not post any of the output, nor did you post the code to run the interpreter.
Here's my code (I'm omitting the code for Interpreter as it's the same as yours):
package postfix;
import postfix.parser.*;
import postfix.lexer.*;
import postfix.node.*;
import java.io.*;
public class Compiler {
public static void main(String[] arguments) {
try {
Parser p = new Parser(new Lexer(new PushbackReader(
new StringReader("(45 + 36/2) * 3 + 5 * 2"), 1024)));
Start tree = p.parse();
tree.apply(new Interpreter());
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
and when run, it produces this:
45 36 2 / + 3 * 5 2 * +
Note the * is displayed, as expected.
UPDATE 2015-03-09
First, please copy/paste this grammar into a file named postfix.grammar. It should be the same as the one you have, but just copy/paste anyway:
Package postfix;
Tokens
number = ['0' .. '9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
mod = '%';
l_par = '(';
r_par = ')';
blank = (' ' | 13 | 10)+;
Ignored Tokens
blank;
Productions
expr =
{factor} factor |
{plus} expr plus factor |
{minus} expr minus factor;
factor =
{term} term |
{mult} factor mult term |
{div} factor div term |
{mod} factor mod term;
term =
{number} number |
{expr} l_par expr r_par;
Next, run this from a command line (make any necessary directory changes, of course):
java -jar "C:\Program Files\Java\sablecc-3.2\lib\sablecc.jar" src\postfix.grammar
Please ensure that you only have the Java classes from this invocation of SableCC (i.e. make sure any previously generated Java classes are deleted). Then using the Compiler class that I previously posted, try again. I cannot think of any problem with the grammar or problem with version 3.2 of SableCC that would cause the problem you're having. I'm hoping a fresh start will fix the problem.

Related

how to report grammar ambiguity in antlr4

According to the antlr4 book (page 159), and using the grammar Ambig.g4, grammar ambiguity can be reported by:
grun Ambig stat -diagnostics
or equivalently, in code form:
parser.removeErrorListeners();
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
The grun command reports the ambiguity properly for me, using antlr-4.5.3. But when I use the code form, I dont get the ambiguity report. Here is the command trace:
$ antlr4 Ambig.g4 # see the book's page.159 for the grammar
$ javac Ambig*.java
$ grun Ambig stat -diagnostics < in1.txt # in1.txt is as shown on page.159
line 1:3 reportAttemptingFullContext d=0 (stat), input='f();'
line 1:3 reportAmbiguity d=0 (stat): ambigAlts={1, 2}, input='f();'
$ javac TestA_Listener.java
$ java TestA_Listener < in1.txt # exits silently
The TestA_Listener.java code is the following:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.*; // for PredictionMode
import java.util.*;
public class TestA_Listener {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat();
}
}
Can somebody please point out how the above java code should be modified, to print the ambiguity report?
For completeness, here is the code Ambig.g4 :
grammar Ambig;
stat: expr ';' // expression statement
| ID '(' ')' ';' // function call statement
;
expr: ID '(' ')'
| INT
;
INT : [0-9]+ ;
ID : [a-zA-Z]+ ;
WS : [ \t\r\n]+ -> skip ;
And here is the input file in1.txt :
f();
Antlr4 is a top-down parser, so for the given input, the parse match is unambiguously:
stat -> expr -> ID -> ( -> ) -> stat(cnt'd) -> ;
The second stat alt is redundant and never reached, not ambiguous.
To resolve the apparent redundancy, a predicate might be used:
stat: e=expr {isValidExpr($e)}? ';' #exprStmt
| ID '(' ')' ';' #funcStmt
;
When isValidExpr is false, the function statement alternative will be evaluated.
I waited for several days for other people to post their answers. Finally after several rounds of experimenting, I found an answer:
The following line should be deleted from the above code. Then we get the same ambiguity report as given by grun.
parser.removeErrorListeners(); // remove ConsoleErrorListener
The following code will be work
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
//parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new org.antlr.v4.runtime.DiagnosticErrorListener()); // add ours
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat(); // parse as usual
}

parse text without explicit endtag

The problem of parsing a dictionary-entry (see examples below) is based on
not having explicit start- and end-tags but:
the end-tag of one element is already the start-tag of the following element
or: the start-tag is not a syntactical element, but it is the current parsing state (so it depends on what you have already "seen" in the inputstream)
example 1, simple entry:
wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF
example 2, entry with multi-definition:
wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF
As I would say it in words or pseudo-code:
dictionary-entry :
word = .+ ' ' // catch everything as word until you see a space
phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
(MultipleMeaning | UniqueMeaning)
MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number
// before the definition
UniqueMeaning : definition= .+ ':'
I've tried the Lexer with gates (antlr-version: 3.2)
#members {
int cs = 0; // current state
}
#lexer::header {
package main;
}
Word :
{cs==0}?=> .+ ' ' {cs=1;} // in this state everything until
; // Space belongs to the Word, now go to Phon-mode
Phon :
{cs==1}?=> '[' .+ ']' {cs=2;} // everything in brackets is phonetic-information
; // after you have seen this go to next state
MultiDef :
{cs==2}?=> Int '.' .+ ':' {cs=3;}
;
Def :
{cs==2}?=> .+ ':' {cs=3;}
;
fragment
Digit :
'0'..'9';
Int :
Digit Digit*;
The TestLexer:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;
public class TestLexer {
public static void main(String[] args) {
String str = "Word [phon]1.definition:";
CharStream input = new ANTLRStringStream(str);
DudenLexer lexer = new DudenLexer(input);
Token token;
while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
System.out.println("Token: "+token);
}
}
}
The problems I have:
I get the error-msg: line 1:0 rule Def failed predicate: {cs==2}?
I don't know if this is even the right way to do?
I am stucked for about three days with this and am very thankful for any help and hints.
Thank you,
Tom

How can I build an ANTLR Works style parse tree?

I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123

variable not passed to predicate method in ANTLR

The java code generated from ANTLR is one rule, one method in most times. But for the following rule:
switchBlockLabels[ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts]
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel[_entity, _method, _preStmts]* switchDefaultLabel? switchCaseLabel*)
;
it generates a submethod named synpred125_TreeParserStage3_fragment(), in which mehod switchCaseLabel(_entity, _method, _preStmts) is called:
synpred125_TreeParserStage3_fragment(){
......
switchCaseLabel(_entity, _method, _preStmts);//variable not found error
......
}
switchBlockLabels(ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts){
......
synpred125_TreeParserStage3_fragment();
......
}
The problem is switchCaseLabel has parameters and the parameters come from the parameters of switchBlockLabels() method, so "variable not found error" occurs.
How can I solve this problem?
My guess is that you've enabled global backtracking in your grammar like this:
options {
backtrack=true;
}
in which case you can't pass parameters to ambiguous rules. In order to communicate between ambiguous rules when you have enabled global backtracking, you must use rule scopes. The "predicate-methods" do have access to rule scopes variables.
A demo
Let's say we have this ambiguous grammar:
grammar Scope;
options {
backtrack=true;
}
parse
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName
: Number
| Name
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
(for the record, the atom+ and numberOrName+ make it ambiguous)
If you now want to pass information between the parse and numberOrName rule, say an integer n, something like this will fail (which is the way you tried it):
grammar Scope;
options {
backtrack=true;
}
parse
#init{int n = 0;}
: (atom[++n])+ EOF
;
atom[int n]
: (numberOrName[n])+
;
numberOrName[int n]
: Number {System.out.println(n + " = " + $Number.text);}
| Name {System.out.println(n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
In order to do this using rule scopes, you could do it like this:
grammar Scope;
options {
backtrack=true;
}
parse
scope{int n; /* define the scoped variable */ }
#init{$parse::n = 0; /* important: initialize the variable! */ }
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName /* increment and print the scoped variable from the parse rule */
: Number {System.out.println(++$parse::n + " = " + $Number.text);}
| Name {System.out.println(++$parse::n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Test
If you now run the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "foo 42 Bar 666";
ScopeLexer lexer = new ScopeLexer(new ANTLRStringStream(src));
ScopeParser parser = new ScopeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
you will see the following being printed to the console:
1 = foo
2 = 42
3 = Bar
4 = 666
P.S.
I don't know what language you're parsing, but enabling global backtracking is usually overkill and can have quite an impact on the performance of your parser. Computer languages often are ambiguous in just a few cases. Instead of enabling global backtracking, you really should look into adding syntactic predicates, or enabling backtracking on those rules that are ambiguous. See The Definitive ANTLR Reference for more info.

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl
First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}