antlr rule boolean parameter showing up in syntactic predicate code one level higher, causing compilation errors - antlr

I have a grammar that can parse expressions like 1+2-4 or 1+x-y, creating an appropriate structure on the fly which later, given a Map<String, Integer> with appropriate content, can be evaluated numerically (after parsing is complete, i.e. for x or y only known later).
Inside the grammar, there are also places where an expression that can be evaluated on the spot, i.e. does not contain variables, should occur. I figured I could parse these with the same logic, adding a boolean parameter variablesAllowed to the rule, like so:
grammar MiniExprParser;
INT : ('0'..'9')+;
ID : ('a'..'z'| 'A'..'Z')('a'..'z'| 'A'..'Z'| '0'..'9')*;
PLUS : '+';
MINUS : '-';
numexpr returns [Double val]:
expr[false] {$val = /* some evaluation code */ 0.;};
varexpr /*...*/:
expr[true] {/*...*/};
expr[boolean varsAllowed] /*...*/:
e=atomNode[varsAllowed] {/*...*/}
(PLUS e2=atomNode[varsAllowed] {/*...*/}
|MINUS e2=atomNode[varsAllowed] {/*...*/}
)* ;
atomNode[boolean varsAllowed] /*...*/:
(n=INT {/*...*/})
|{varsAllowed}?=> ID {/*...*/}
;
result:
(numexpr) => numexpr {System.out.println("Numeric result: " + $numexpr.val);}
|varexpr {System.out.println("Variable expression: " + $varexpr.text);};
However, the generated Java code does not compile. In the part apparently responsible for the final rule's syntactic predicate, varsAllowed occurs even although the variable is never defined at this level.
/* ... */
else if ( (LA3_0==ID) && ((varsAllowed))) {
int LA3_2 = input.LA(2);
if ( ((synpred1_MiniExprParser()&&(varsAllowed))) ) {
alt3=1;
}
else if ( ((varsAllowed)) ) {
alt3=2;
}
/* ... */
Am I using it wrong? (I am using Eclipse' AntlrIDE 2.1.2 with Antlr 3.5.2.)

This problem is part of the hoisting process the parser uses for prediction. I encountered the same problem and ended up with a member var (or static var for the C target) instead of a parameter.

Related

Antlr: how to switch on token type in Visitor implementation

I'm playing around with Antlr, designing a toy language, which I think is where most people start! - I had a question on how best to think about switching on token type.
consider a 'function call' in the language, where a function can consume a string, number or variable - for example like the below (project() is the function call)
project("ABC") vs project(123) vs project($SOME_VARIABLE)
I have the alteration operator in my grammar, so the grammar parses the right thing, but in the visitor code, it would be nice to tell the difference between the three versions of the above.
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
try {
s1 = ctx.STRING_LITERAL().getText();
}catch(Exception e){}
try{
s2 = ctx.NUM().getText();
}catch(Exception e){}
System.out.println("Created Project via => " + ctx.getChild(1).toString());
}
The code above worked, depending on whether s1 or s2 are null, I can infer how I was called (with a literal or a number, I haven't shown the variable case above), but I'm interested if there is a better or more elegant way - for example switching on token type inside the visitor code to actually process the language.
The grammar I had for the above was
createproj: 'project('WS?(STRING_LITERAL|NUM)')';
and when I use the intellij antlr plugin, it seems to know the token type of the argument to the project() function - but I don't seem to be able to get to it from my code.
You could do something like this:
createproj
: 'project' '(' WS? param ')'
;
param
: STRING_LITERAL
| NUM
;
and in your visitor code:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param().start.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
so by inlining the token in the grammar I had originally, I've lost the opportunity to inspect it in the visitor code?
No really, you could also do it like this:
createproj
: 'project' '(' WS? param_token=(STRING_LITERAL | NUM) ')'
;
and could then do this:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param_token.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
Just make sure you don't mix lexer rules (tokens) and parser rules in your set param_token=( ... ). When it's a parser rule, ctx.param_token.getType() will fail (it must then be ctx.param_token.start.getType()). That is why I recommended adding an extra parser rule, because this would then still work:
param
: STRING_LITERAL
| NUM
| some_parser_rule
;

What could be a reason for `_localctx` being null in an antlr4 semantic predicate?

I'm using list labels to gather tokens and semantic predicates to validate sequences in my parser grammar. E.g.
line
:
(text+=WORD | text+=NUMBER)+ ((BLANK | SKIP)+ (text+=WORD | text+=NUMBER)+)+
{Parser.validateContext(_localctx)}?
(BLANK | SKIP)*
;
where
WORD: [\u0021-\u002F\u003A-\u007E]+; // printable ASCII characters (excluding SP and numbers)
NUMBER: [\u0030-\u0039]+; // printable ASCII number characters
BLANK: '\u0020';
SKIP: '\u0020\u0020' | '\t'; // two SPs or a HT symbol
The part of Parser.validateContext used to validate the line rule would be implemented like this
private static final boolean validateContext(ParserRuleContext context) {
//.. other contexts
if(context instanceof LineContext)
return "<reference-sequence>".equals(Parser.joinTokens(((LineContext) context).text, " "));
return false;}
where Parser.joinTokens is defined as
private static String joinTokens(java.util.List<org.antlr.v4.runtime.Token> tokens, String delimiter) {
StringBuilder builder = new StringBuilder();
int i = 0, n;
if((n = tokens.size()) == 0) return "";
builder.append(tokens.get(0).getText());
while(++i < n) builder.append(delimiter + tokens.get(i).getText());
return builder.toString();}
Both are put in a #parser::members clause a the beginning of the grammar file.
My problem is this: sometimes the _localctx reference is null and I receive "no viable alternative" errors. These are probably caused because the failing predicate guards the respective rule and no alternative is given.
Is there a reason–potentially an error on my part–why _localctx would be null?
UPDATE: The answer to this question seems to suggest that semantic predicates are also called during prediction. Maybe during prediction no context is created and _localctx is set to null.
The semantics of _localctx in a predicate are not defined. Allowable behavior includes, but is not limited to the following (and may change during any release):
Failing to compile (no identifier with that name)
Using the wrong context object
Not having a context object (null)
To reference the context of the current rule from within a predicate, you need to use $ctx instead.
Note that the same applies for rule parameters, locals, and/or return values which are used in a predicate. For example, the parameter a cannot be referenced as a, but must instead be $a.

How to get the evaluation result from the parser expression when using antlr 3?

I'm using ANTLR 3.5. I would like to build a grammar that evaluates boolean expressions like
x=true;
b=false;
c=true;
a=x&&b||c;
and get back the evaluation result via a Java call (like ExprParser.eval() of the above entry will return true.)
I'll look forward for an example.
You can do something like below (using the context of the grammar that I linked to in the comments to the question):
First of all, declare a member to store the latest evaluation result:
#members {
private int __value;
}
Then, set it whenever you compute something
stat: expr NEWLINE { __value = $expr.value; } | // rest of the stat entry
And, finally, return it when all the stats are computed:
// will return 0 if no expr blocks were evaluated
public prog returns [int value]: stat+ {$value = __value;};
In C#, I used slightly different approach — I added an event to the parser and raised it when an expression result could computed. A client can subscribe to this event and receive all the computation results.
#members
{
public event Action<int> Computed;
}
stat: expr NEWLINE { Computed($expr.value); }

ANTLR Grammar gives me upsidedown-tree

I have a grammar which parses dot notion expressions like this:
a.b.c
memberExpression returns [Expression value]
: i=ID { $value = ParameterExpression($i.value); }
('.' m=memberExpression { $value = MemberExpression($m.value, $i.value); }
)*
;
This parses expressions fine and gives me a tree structure like this:
MemberExpression(
MemberExpression(
ParameterExpression("c"),
"b"
)
, "a"
)
But my problem is that I want a tree that looks like this:
MemberExpression(
MemberExpression(
ParameterExpression("a"),
"b"
)
, "c"
)
for the same expression "a.b.c"
How can I achieve this?
You could do this by collecting all tokens in a java.util.List using ANTLR's convenience += operator and create the desired tree using a custom method in your #parser::members section:
// grammar def ...
// options ...
#parser::members {
private Expression customTree(List tks) {
// `tks` is a java.util.List containing `CommonToken` objects
}
}
// parser ...
memberExpression returns [Expression value]
: ids+=ID ('.' ids+=ID)* { $value = customTree($ids); }
;
I think what you are asking for is mutually left recursive, and therefore ANTLR is not a good choice to parse it.
To elaborate, you need C at the root of the tree and therefore your rule would be:
rule: rule ID;
This rule will be uncertain whether it should match
a.b
or
a.b.c

Processing an n-ary ANTLR AST one child at a time

I currently have a compiler that uses an AST where all children of a code block are on the same level (ie, block.children == {stm1, stm2, stm3, etc...}). I am trying to do liveness analysis on this tree, which means that I need to take the value returned from the processing of stm1 and then pass it to stm2, then take the value returned by stm2 and pass it to stm3, and so on. I do not see a way of executing the child rules in this fashion when the AST is structured this way.
Is there a way to allow me to chain the execution of the child grammar items with my given AST, or am I going to have to go through the painful process of refactoring the parser to generate a nested structure and updating the rest of the compiler to work with the new AST?
Example ANTLR grammar fragment:
block
: ^(BLOCK statement*)
;
statement
: // stuff
;
What I hope I don't have to go to:
block
: ^(BLOCK statementList)
;
statementList
: ^(StmLst statement statement+)
| ^(StmLst statement)
;
statement
: // stuff
;
Parser (or lexer) rules can take parameter values and can return a value. So, in your case, you can do something like:
block
#init {Object o = null; /* initialize the value being passed through */ }
: ^(BLOCK (s=statement[o] {$o = $s.returnValue; /*re-assign 'o' */ } )*)
;
statement [Object parameter] returns [Object returnValue]
: // do something with 'parameter' and 'returnValue'
;
Here's a very simple example that you can use to play around with:
grammar Test;
#members{
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("1;2;3;4;");
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.parse();
}
}
parse
: block EOF
;
block
#init{int temp = 0;}
: (i=statement[temp] {temp = $i.ret;} ';')+
;
statement [int param] returns [int ret]
: Number {$ret = $param + Integer.parseInt($Number.text);}
{System.out.printf("param=\%d, Number=\%s, ret=\%d\n", $param, $Number.text, $ret);}
;
Number
: '0'..'9'+
;
When you've generated a parser and lexer from it and compiled these classes, execute the TestParser class and you'll see the following printed to your console:
param=0, Number=1, ret=1
param=1, Number=2, ret=3
param=3, Number=3, ret=6
param=6, Number=4, ret=10