how to tell which non-terminal is missing within an expansion in ANTLR - antlr

oC_RangeLiteral
: '*' SP? ( oC_IntegerLiteral SP? )? ( '..' SP? ( oC_IntegerLiteral SP? )? )? ;
Given a parser tree with ctx->oC_IntegerLiteral().size() == 1, How can I tell whether the first one is missing or the second one is missing?
Maybe the question title can be improved based on this concrete question.

You can label rule elements with name = like this:
oC_RangeLiteral
: '*' SP?
( first=oC_IntegerLiteral SP? )?
( '..' SP? ( second=oC_IntegerLiteral SP? )? )?
;
Not sure how to use them with the C++ runtime, but for, say, Java, that'd look like this:
if (ctx.first != null) {
// 'first' exists
}
if (ctx.second != null) {
// 'second' exists
}
EDIT
Is it possible to achieve similar without modifying the grammar?
Sure, but that makes it a lot messier. You'd need to figure out if there is a .. among the children of oC_RangeLiteral and then check if the oC_IntegerLiteral comes before or after this .. token. Something like this:
// Only need to check if 1 child is present: in case of 2 or 0 children, it is clear
if (ctx.oC_IntegerLiteral().size() == 1) {
int indexOfIntegerLiteral = ctx.children.indexOf(ctx.oC_IntegerLiteral().get(0));
OptionalInt indexOfDotDot = java.util.stream.IntStream
.range(0, ctx.children.size())
.filter(i -> ctx.children.get(i).getText().equals(".."))
.findFirst();
System.out.printf("indexOfIntegerLiteral=%d\n", indexOfIntegerLiteral);
System.out.printf("indexOfDotDot=%s\n", indexOfDotDot);
if (indexOfDotDot.isPresent() && indexOfIntegerLiteral > indexOfDotDot.getAsInt()) {
// If there is a ".." and the single `oC_IntegerLiteral` comes after it: it's the second one
}
else {
// otherwise, it's the first `oC_IntegerLiteral`
}
}

Another way is this:
Give your terminals own token names, say
STAR: '*';
DOTDOT: '..';
With that your rules becomes:
oC_RangeLiteral
: STAR SP? ( oC_IntegerLiteral SP? )? ( DOTDOT SP? ( oC_IntegerLiteral SP? )? )? ;
Now in your code you just check for DOTDOT:
if (ctx->oC_IntegerLiteral().size() == 1) {
if (ctx->DOTDOT()) {
// ctx->oC_IntegerLiteral(0) is the second integer literal
} else {
// ctx->oC_IntegerLiteral(0) is the first integer literal
}
}
As a side note: ctx->oC_IntegerLiteral().size() can be 0, because the it's optional in both cases where it appears in the rule. So, just testing the existence of DOTDOT does not tell you there's at least one integer literal.

Related

antlr rule boolean parameter showing up in syntactic predicate code one level higher, causing compilation errors

I have a grammar that can parse expressions like 1+2-4 or 1+x-y, creating an appropriate structure on the fly which later, given a Map<String, Integer> with appropriate content, can be evaluated numerically (after parsing is complete, i.e. for x or y only known later).
Inside the grammar, there are also places where an expression that can be evaluated on the spot, i.e. does not contain variables, should occur. I figured I could parse these with the same logic, adding a boolean parameter variablesAllowed to the rule, like so:
grammar MiniExprParser;
INT : ('0'..'9')+;
ID : ('a'..'z'| 'A'..'Z')('a'..'z'| 'A'..'Z'| '0'..'9')*;
PLUS : '+';
MINUS : '-';
numexpr returns [Double val]:
expr[false] {$val = /* some evaluation code */ 0.;};
varexpr /*...*/:
expr[true] {/*...*/};
expr[boolean varsAllowed] /*...*/:
e=atomNode[varsAllowed] {/*...*/}
(PLUS e2=atomNode[varsAllowed] {/*...*/}
|MINUS e2=atomNode[varsAllowed] {/*...*/}
)* ;
atomNode[boolean varsAllowed] /*...*/:
(n=INT {/*...*/})
|{varsAllowed}?=> ID {/*...*/}
;
result:
(numexpr) => numexpr {System.out.println("Numeric result: " + $numexpr.val);}
|varexpr {System.out.println("Variable expression: " + $varexpr.text);};
However, the generated Java code does not compile. In the part apparently responsible for the final rule's syntactic predicate, varsAllowed occurs even although the variable is never defined at this level.
/* ... */
else if ( (LA3_0==ID) && ((varsAllowed))) {
int LA3_2 = input.LA(2);
if ( ((synpred1_MiniExprParser()&&(varsAllowed))) ) {
alt3=1;
}
else if ( ((varsAllowed)) ) {
alt3=2;
}
/* ... */
Am I using it wrong? (I am using Eclipse' AntlrIDE 2.1.2 with Antlr 3.5.2.)
This problem is part of the hoisting process the parser uses for prediction. I encountered the same problem and ended up with a member var (or static var for the C target) instead of a parameter.

skipping parts of a matched lexical element or token

I would like to match a "{NUM}" and then have the lexer rule return "NUM". so, I tried
NUM : ('{' { skip(); }) 'NUM' ('}' { skip(); });
But, that seems to skip everything and return empty on a match. would it be possible to skip parts of a lexer match ?
antlr 3.4
Invoking skip() anywhere in your rule will remove the entire token from the lexer, not just certain characters.
What you could do is this:
NUM
: '{NUM}' {setText("NUM");}
;
Or, if NUM is variable, do:
NUM
: '{' 'A'..'Z'+ '}' {setText($text.substring(1, $text.length() - 1));}
;
which removes the first and last char from the token.
EDIT
smartnut007 wrote:
Is there an equivalent way to do this for Tokens ?
If you mean how to change the text of tokens inside parser rules, try this:
parser_rule
: LEXER_RULE {$LEXER_RULE.setText("new-text");}
;
LEXER_RULE
: 'old-text'
;

simple math expression parser

I have a simple math expression parser and I want to build the AST by myself (means no ast parser). But every node can just hold two operands. So a 2+3+4 will result in a tree like this:
+
/ \
2 +
/ \
3 4
The problem is, that I am not able to get my grammer doing the recursion, here ist just the "add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=multiply { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)*
;
But at the end of the day this will produce a single tree like:
+
/ \
2 4
I know where the problem is, it is because a "add" can occour never or infinitly (*) but I do not know how to solve this. I thought of something like:
"add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=(multiply|add) { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)?
;
But this will give me a recoursion error. Any ideas?
I don't have the full grammar to test this solution, but consider replacing this (from the first add rule in the question):
$e = new AddOperator($op1.e, $op2.e);
With this:
$e = new AddOperator($e, $op2.e); //$e instead of $op1.e
This way each iteration over ('+' multiply)* extends e rather than replaces it.
It may require a little playing around to get it right, or you may need a temporary Expression in the rule to keep things managed. Just make sure that the last expression created by the loop is somewhere on the right-hand side of the = operator, as in $e = new XYZ($e, $rhs.e);.

Is there way to detect if an optional (? operator) tree grammar rule executed in an action?

path[Scope sc] returns [Path p]
#init{
List<String> parts = new ArrayList<String>();
}
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex? )
{// ACTION CODE
// need to check if pathIndex has executed before running this code.
if ($pathIndex.index >=0 ){
p = new Path($sc, parts, $pathIndex.index);
}else if($pathIndex.pathKey != ""){
p = new Path($sc, parts, $pathIndex.pathKey);
}
;
Is there a way to detect if pathIndex was executed? In my action code, I tried testing $pathIndex == null, but ANTLR doesn't let you do that. ANTLRWorks gives a syntax error which saying "Missing attribute access on rule scope: pathIndex."
The reason why I need to do this is because in my action code I do:
$pathIndex.index
which returns 0 if the variable $pathIndex is translated to is null. When you are accessing an attribute, ANTLR generates pathIndex7!=null?pathIndex7.index:0 This causes a problem with an object because it changes a value I have preset to -1 as an error flag to 0.
There are a couple of options:
1
Put your code inside the optional pathIndex:
rule
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {/*pathIndex cannot be null here!*/} )? )
;
2
Use a boolean flag to denote the presence (or absence) of pathIndex:
rule
#init{boolean flag = false;}
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {flag = true;} )? )
{
if(flag) {
// ...
}
}
;
EDIT
You could also make pathIndex match nothing so that you don't need to make it optional inside path:
path[Scope sc] returns [Path p]
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex)
{
// code
}
;
pathIndex returns [int index, String pathKey]
#init {
$index = -1;
$pathKey = "";
}
: ( /* some rules here */ )?
;
PS. Realize that the expression $pathIndex.pathKey != "" will most likely evaluate to false. To compare the contents of strings in Java, use their equals(...) method instead:
!$pathIndex.pathKey.equals("")
or if $pathIndex.pathKey can be null, you can circumvent a NPE by doing:
!"".equals($pathIndex.pathKey)
More information would have been helpful. However, if I understand correctly, when a value for the index is not present in the input you want to test for $pathIndex.index == null. This code does that using the pathIndex rule to return the Integer $index to the path rule:
path
: ^(PATH IDENT+ pathIndex?)
{ if ($pathIndex.index == null)
System.out.println("path index is null");
else
System.out.println("path index = " + $pathIndex.index); }
;
pathIndex returns [Integer index]
: DIGIT
{ $index = Integer.parseInt($DIGIT.getText()); }
;
For testing, I created these simple parser and lexer rules:
path : 'path' IDENT+ pathIndex? -> ^(PATH IDENT+ pathIndex?)
;
pathIndex : DIGIT
;
/** lexer rules **/
DIGIT : '0'..'9' ;
IDENT : LETTER+ ;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
When the index is present in the input, as in path a b c 5, the output is:
Tree = (PATH a b c 5)
path index = 5
When the index is not present in the input, as in path a b c, the output is:
Tree = (PATH a b c)
path index is null

ANTLR Grammar gives me upsidedown-tree

I have a grammar which parses dot notion expressions like this:
a.b.c
memberExpression returns [Expression value]
: i=ID { $value = ParameterExpression($i.value); }
('.' m=memberExpression { $value = MemberExpression($m.value, $i.value); }
)*
;
This parses expressions fine and gives me a tree structure like this:
MemberExpression(
MemberExpression(
ParameterExpression("c"),
"b"
)
, "a"
)
But my problem is that I want a tree that looks like this:
MemberExpression(
MemberExpression(
ParameterExpression("a"),
"b"
)
, "c"
)
for the same expression "a.b.c"
How can I achieve this?
You could do this by collecting all tokens in a java.util.List using ANTLR's convenience += operator and create the desired tree using a custom method in your #parser::members section:
// grammar def ...
// options ...
#parser::members {
private Expression customTree(List tks) {
// `tks` is a java.util.List containing `CommonToken` objects
}
}
// parser ...
memberExpression returns [Expression value]
: ids+=ID ('.' ids+=ID)* { $value = customTree($ids); }
;
I think what you are asking for is mutually left recursive, and therefore ANTLR is not a good choice to parse it.
To elaborate, you need C at the root of the tree and therefore your rule would be:
rule: rule ID;
This rule will be uncertain whether it should match
a.b
or
a.b.c