ANTLR3 - Decision can match input using multiple alternatives - antlr

When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;

I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.

Related

Why parser splits command name into different nodes

I have the statement:
=MYFUNCTION_NAME(1,2,3)
My grammar is:
grammar Expression;
options
{
language=CSharp3;
output=AST;
backtrack=true;
}
tokens
{
FUNC;
PARAMS;
}
#parser::namespace { Expression }
#lexer::namespace { Expression }
public
parse : ('=' func )*;
func : funcId '(' formalPar* ')' -> ^(FUNC funcId formalPar);
formalPar : (par ',')* par -> ^(PARAMS par+);
par : INT;
funcId : complexId+ ('_'? complexId+)*;
complexId
: ID+
| ID+DIGIT+ ;
ID : ('a'..'z'|'A'..'Z'|'а'..'я'|'А'..'Я')+;
DIGIT : ('0'..'9')+;
INT : '-'? ('0'..'9')+;
In a tree i get:
[**FUNC**]
|
[MYFUNCTION] [_] [NAME] [**PARAMS**]
Why the parser splits function's name into 3 nodes: "MYFUNCTION, "_", "NAME" ? How can i fix it?
The division is always performed based on tokens. Since the ID token cannot contain an _ character, the result is 3 separate tokens that are handled later by the funcId grammar rule. To create a single node for your function name, you'll need to create a lexer rule that can match the input MYFUNCTION_NAME as a single token.

Antrl lexer/parser exception understanding

I have the following language i wish to parse using antlr 1.2.2.
TEST <name>
{
<param_name> = <param value>;
}
while
<...> - means user value, not part of the language keywords
for example
TEST myTest
{
my_param = 1.0;
}
the value can be an integer, a real or a quated string
my_param = 1.0;, my_param = 1; and my_param = "myStringValue"; are all valid inputs.
here is the grammer for this parsing.
parse_test : TESTKEYWORD TEST_NAME '{' param_value_def '}';
param_value_def : ID EQUALS param_value ';';
param_value : REAL|INTEGER|QUOTED_STRING;
TESTKEYWORD : 'TEST';
QUOTED_STRING : '"' ~('"')* '"';
INTEGER : MINUS? DIGIT DIGIT*
REAL : INTEGER '.' DIGIT DIGIT*;
EQUALS : '=';
fragment
MINUS : '-';
fragment
DIGIT : '0'..'9';
when i feed the sample input to the antlr interpreter, i get a `MismatchedTokenException' related to the param_value rule.
can you help me cipher the error message and what i am doing wrong?
thanks
Although ANTLRWorks is not a tool well written, you can use its debugger to see which token in the input leads to this exception, and then you can see which rules need to be revised (since you did not post the full grammar).
http://www.antlr.org/works/index.html

ANTLR Parser, need to which parser rule is matched

In ANTLR, for a given token, is there a way to tell which parser rule is matched?
For example, from the ANTLR grammar:
tokens
{
ADD='Add';
SUB='Sub';
}
fragment
ANYDIGIT : '0'..'9';
fragment
UCASECHAR : 'A'..'Z';
fragment
LCASECHAR : 'a'..'z';
fragment
DATEPART : ('0'..'1') (ANYDIGIT) '/' ('0'..'3') (ANYDIGIT) '/' (ANYDIGIT) (ANYDIGIT) (ANYDIGIT) (ANYDIGIT);
fragment
TIMEPART : ('0'..'2') (ANYDIGIT) ':' ('0'..'5') (ANYDIGIT) ':' ('0'..'5') (ANYDIGIT);
SPACE : ' ';
NEWLINE : '\r'? '\n';
TAB : '\t';
FORMFEED : '\f';
WS : (SPACE|NEWLINE|TAB|FORMFEED)+ {$channel=HIDDEN;};
IDENTIFIER : (LCASECHAR|UCASECHAR|'_') (LCASECHAR|UCASECHAR|ANYDIGIT|'_')*;
TIME : '\'' (TIMEPART) '\'';
DATE : '\'' (DATEPART) (' ' (TIMEPART))? '\'';
STRING : '\''! (.)* '\''!;
DOUBLE : (ANYDIGIT)+ '.' (ANYDIGIT)+;
INT : (ANYDIGIT)+;
literal : INT|DOUBLE|STRING|DATE|TIME;
var : IDENTIFIER;
param : literal|fcn_call|var;
fcn_name : ADD |
SUB |
DIVIDE |
MOD |
DTSECONDSBETWEEN |
DTGETCURRENTDATETIME |
APPEND |
STRINGTOFLOAT;
fcn_call : fcn_name WS? '('! WS? ( param WS? ( ','! WS? param)*)* ')'!;
expr : fcn_call WS? EOF;
And in Java:
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
nodes.reset();
Object obj;
while((obj = nodes.nextElement()) != null)
{
if(nodes.isEOF(obj))
{
break;
}
System.out.println(obj);
}
So, what I want to know, at System.out.println(obj), did the node match the fcn_name rule, or did it match the var rule.
The reason being, I am trying to handle vars differently than fcn_names.
Add this to your listener/visitor:
String[] ruleNames;
public void loadParser(gramParser parser) { //get parser
ruleNames = parser.getRuleNames(); //load parser rules from parser
}
Call loadParser() from wherever you create your listener/visitor, eg.:
MyParser parser = new MyParser(tokens);
MyListener listener = new MyListener();
listener.loadParser(parser); //so we can access rule names
Then inside each rule you can get the name of the rule like this:
ruleName = ruleNames[ctx.getRuleIndex()];
No, you cannot get the name of a parser rule (at least, not without an ugly hack ➊).
But if tree is an instance of CommonTree, it means you've already invoked the expr rule of your parser, which means you already know expr matches first (which in its turn matches fcn_name).
➊ On a related note, see: Get active Antlr rule

variable not passed to predicate method in ANTLR

The java code generated from ANTLR is one rule, one method in most times. But for the following rule:
switchBlockLabels[ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts]
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel[_entity, _method, _preStmts]* switchDefaultLabel? switchCaseLabel*)
;
it generates a submethod named synpred125_TreeParserStage3_fragment(), in which mehod switchCaseLabel(_entity, _method, _preStmts) is called:
synpred125_TreeParserStage3_fragment(){
......
switchCaseLabel(_entity, _method, _preStmts);//variable not found error
......
}
switchBlockLabels(ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts){
......
synpred125_TreeParserStage3_fragment();
......
}
The problem is switchCaseLabel has parameters and the parameters come from the parameters of switchBlockLabels() method, so "variable not found error" occurs.
How can I solve this problem?
My guess is that you've enabled global backtracking in your grammar like this:
options {
backtrack=true;
}
in which case you can't pass parameters to ambiguous rules. In order to communicate between ambiguous rules when you have enabled global backtracking, you must use rule scopes. The "predicate-methods" do have access to rule scopes variables.
A demo
Let's say we have this ambiguous grammar:
grammar Scope;
options {
backtrack=true;
}
parse
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName
: Number
| Name
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
(for the record, the atom+ and numberOrName+ make it ambiguous)
If you now want to pass information between the parse and numberOrName rule, say an integer n, something like this will fail (which is the way you tried it):
grammar Scope;
options {
backtrack=true;
}
parse
#init{int n = 0;}
: (atom[++n])+ EOF
;
atom[int n]
: (numberOrName[n])+
;
numberOrName[int n]
: Number {System.out.println(n + " = " + $Number.text);}
| Name {System.out.println(n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
In order to do this using rule scopes, you could do it like this:
grammar Scope;
options {
backtrack=true;
}
parse
scope{int n; /* define the scoped variable */ }
#init{$parse::n = 0; /* important: initialize the variable! */ }
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName /* increment and print the scoped variable from the parse rule */
: Number {System.out.println(++$parse::n + " = " + $Number.text);}
| Name {System.out.println(++$parse::n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Test
If you now run the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "foo 42 Bar 666";
ScopeLexer lexer = new ScopeLexer(new ANTLRStringStream(src));
ScopeParser parser = new ScopeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
you will see the following being printed to the console:
1 = foo
2 = 42
3 = Bar
4 = 666
P.S.
I don't know what language you're parsing, but enabling global backtracking is usually overkill and can have quite an impact on the performance of your parser. Computer languages often are ambiguous in just a few cases. Instead of enabling global backtracking, you really should look into adding syntactic predicates, or enabling backtracking on those rules that are ambiguous. See The Definitive ANTLR Reference for more info.

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl
First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}