Is there way to detect if an optional (? operator) tree grammar rule executed in an action? - antlr

path[Scope sc] returns [Path p]
#init{
List<String> parts = new ArrayList<String>();
}
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex? )
{// ACTION CODE
// need to check if pathIndex has executed before running this code.
if ($pathIndex.index >=0 ){
p = new Path($sc, parts, $pathIndex.index);
}else if($pathIndex.pathKey != ""){
p = new Path($sc, parts, $pathIndex.pathKey);
}
;
Is there a way to detect if pathIndex was executed? In my action code, I tried testing $pathIndex == null, but ANTLR doesn't let you do that. ANTLRWorks gives a syntax error which saying "Missing attribute access on rule scope: pathIndex."
The reason why I need to do this is because in my action code I do:
$pathIndex.index
which returns 0 if the variable $pathIndex is translated to is null. When you are accessing an attribute, ANTLR generates pathIndex7!=null?pathIndex7.index:0 This causes a problem with an object because it changes a value I have preset to -1 as an error flag to 0.

There are a couple of options:
1
Put your code inside the optional pathIndex:
rule
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {/*pathIndex cannot be null here!*/} )? )
;
2
Use a boolean flag to denote the presence (or absence) of pathIndex:
rule
#init{boolean flag = false;}
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {flag = true;} )? )
{
if(flag) {
// ...
}
}
;
EDIT
You could also make pathIndex match nothing so that you don't need to make it optional inside path:
path[Scope sc] returns [Path p]
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex)
{
// code
}
;
pathIndex returns [int index, String pathKey]
#init {
$index = -1;
$pathKey = "";
}
: ( /* some rules here */ )?
;
PS. Realize that the expression $pathIndex.pathKey != "" will most likely evaluate to false. To compare the contents of strings in Java, use their equals(...) method instead:
!$pathIndex.pathKey.equals("")
or if $pathIndex.pathKey can be null, you can circumvent a NPE by doing:
!"".equals($pathIndex.pathKey)

More information would have been helpful. However, if I understand correctly, when a value for the index is not present in the input you want to test for $pathIndex.index == null. This code does that using the pathIndex rule to return the Integer $index to the path rule:
path
: ^(PATH IDENT+ pathIndex?)
{ if ($pathIndex.index == null)
System.out.println("path index is null");
else
System.out.println("path index = " + $pathIndex.index); }
;
pathIndex returns [Integer index]
: DIGIT
{ $index = Integer.parseInt($DIGIT.getText()); }
;
For testing, I created these simple parser and lexer rules:
path : 'path' IDENT+ pathIndex? -> ^(PATH IDENT+ pathIndex?)
;
pathIndex : DIGIT
;
/** lexer rules **/
DIGIT : '0'..'9' ;
IDENT : LETTER+ ;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
When the index is present in the input, as in path a b c 5, the output is:
Tree = (PATH a b c 5)
path index = 5
When the index is not present in the input, as in path a b c, the output is:
Tree = (PATH a b c)
path index is null

Related

PEGJS predicate grammar

I need to create a grammar with the help of predicate. The below grammar fails for the given case.
startRule = a:namespace DOT b:id OPEN_BRACE CLOSE_BRACE {return {"namespace": a, "name": b}}
namespace = id (DOT id)*
DOT = '.';
OPEN_BRACE = '(';
CLOSE_BRACE = ')';
id = [a-zA-Z]+;
It fails for the given input as
com.mytest.create();
which should have given "create" as value of "name" key in the result part.
Any help would be great.
There are several things here.
The most important, is that you must be aware that PEG is greedy. That means that your (DOT id)* rule matches ALL the DOT id sequences, including the one that you have in startRule as DOT b:id.
That can be solved using lookahead.
The other thing is that you must remember to use join, since by default it will return each character as the member of an array.
I also added a rule for semicolons.
Try this:
start =
namespace:namespace DOT name:string OPEN_BRACE CLOSE_BRACE SM nl?
{
return { namespace : namespace, name : name };
}
/* Here I'm using the lookahead: (member !OPEN_BRACE)* */
namespace =
first:string rest:(member !OPEN_BRACE)*
{
rest = rest.map(function (x) { return x[0]; });
rest.unshift(first);
return rest;
}
member =
DOT str:string
{ return str; }
DOT =
'.'
OPEN_BRACE =
'('
CLOSE_BRACE =
')'
SM =
';'
nl =
"\n"
string =
str:[a-zA-Z]+
{ return str.join(''); }
And as far I can tell, I'm parsing that line correctly.

My simple ANTLR grammar ignores certain invalid tokens when parsing

I asked a question a couple of weeks ago about my ANTLR grammar (My simple ANTLR grammar is not working as expected). Since asking that question, I've done more digging and debugging and gotten most of the kinks out. I am left with one issue, though.
My generated parser code is not picking up invalid tokens in one particular part of the text that is processed. The lexer is properly breaking things into tokens, but the parser does not kick out invalid tokens in some cases. In particular, when the invalid token is at the end of a phrase like "A and "B", the parser ignores it - it's like the token isn't even there.
Some specific examples:
"A and B" - perfectly valid
"A# and B" - parser properly picks up the invalid # token
"A and #B" - parser properly picks up the invalid # token
"A and B#" - here's the mystery - the lexer finds the # token and the parser IGNORES it (!)
"(A and B#) or C" - further mystery - the lexer finds the # token and the parser IGNORES it (!)
Here is my grammar:
grammar QvidianPlaybooks;
options{ language=CSharp3; output=AST; ASTLabelType = CommonTree; }
public parse
: expression
;
LPAREN : '(' ;
RPAREN : ')' ;
ANDOR : 'AND'|'and'|'OR'|'or';
NAME : ('A'..'Z');
WS : ' ' { $channel = Hidden; };
THEREST : .;
// ***************** parser rules:
expression : anexpression EOF!;
anexpression : atom (ANDOR^ atom)*;
atom : NAME | LPAREN! anexpression RPAREN!;
The code that then processes the resulting tree looks like this:
... from the main program
QvidianPlaybooksLexer lexer = new QvidianPlaybooksLexer(new ANTLRStringStream(src));
QvidianPlaybooksParser parser = new QvidianPlaybooksParser(new CommonTokenStream(lexer));
parser.TreeAdaptor = new CommonTreeAdaptor();
CommonTree tree = (CommonTree)parser.parse().Tree;
ValidateTree(tree, 0, iValidIdentifierCount);
// recursive code that walks the tree
public static RuleLogicValidationResult ValidateTree(ITree Tree, int depth, int conditionCount)
{
RuleLogicValidationResult rlvr = null;
if (Tree != null)
{
CommonErrorNode commonErrorNode = Tree as CommonErrorNode;
if (null != commonErrorNode)
{
rlvr = new RuleLogicValidationResult();
rlvr.IsValid = false;
rlvr.ErrorType = LogicValidationErrorType.Other;
Console.WriteLine(rlvr.ToString());
}
else
{
string strTree = Tree.ToString();
strTree = strTree.Trim();
strTree = strTree.ToUpper();
if ((Tree.ChildCount != 0) && (Tree.ChildCount != 2))
{
rlvr = new RuleLogicValidationResult();
rlvr.IsValid = false;
rlvr.ErrorType = LogicValidationErrorType.Other;
rlvr.InvalidIdentifier = strTree;
rlvr.ErrorPosition = 0;
Console.WriteLine(String.Format("CHILD COUNT of {0} = {1}", strTree, tree.ChildCount));
}
// if the current node is valid, then validate the two child nodes
if (null == rlvr || rlvr.IsValid)
{
// output the tree node
for (int i = 0; i < depth; i++)
{
Console.Write(" ");
}
Console.WriteLine(Tree);
rlvr = ValidateTree(Tree.GetChild(0), depth + 1, conditionCount);
if (rlvr.IsValid)
{
rlvr = ValidateTree(Tree.GetChild(1), depth + 1, conditionCount);
}
}
else
{
Console.WriteLine(rlvr.ToString());
}
}
}
else
{
// this tree is null, return a "it's valid" result
rlvr = new RuleLogicValidationResult();
rlvr.ErrorType = LogicValidationErrorType.None;
rlvr.IsValid = true;
}
return rlvr;
}
Add EOF to the end of your start rule. :)

ANTLR Variable Troubles

In short: how do I implement dynamic variables in ANTLR?
I come to you again with a basic ANTLR question.
I have this grammar:
grammar Amethyst;
options {
language = Java;
}
#header {
package org.omer.amethyst.generated;
import java.util.HashMap;
}
#lexer::header {
package org.omer.amethyst.generated;
}
#members {
HashMap memory = new HashMap();
}
begin: expr;
expr: (defun | println)*
;
println:
'println' atom {System.out.println($atom.value);}
;
defun:
'defun' VAR INT {memory.put($VAR.text, Integer.parseInt($INT.text));}
| 'defun' VAR STRING_LITERAL {memory.put($VAR.text, $STRING_LITERAL.text);}
;
atom returns [Object value]:
INT {$value = Integer.parseInt($INT.text);}
| ID
{
Object v = memory.get($ID.text);
if (v != null) $value = v;
else System.err.println("undefined variable " + $ID.text);
}
| STRING_LITERAL
{
String v = (String) memory.get($STRING_LITERAL.text);
if (v != null) $value = String.valueOf(v);
else System.err.println("undefined variable " + $STRING_LITERAL.text);
}
;
INT: '0'..'9'+ ;
STRING_LITERAL: '"' .* '"';
VAR: ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')* ;
ID: ('a'..'z'|'A'..'Z'|'0'..'9')+ ;
LETTER: ('a..z'|'A'..'Z')+ ;
WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;
What it does (or should do), so far, is have a built-in "println" function to do exactly what you think it does, and a "defun" rule to define variables.
When "defun" is called on either a string or integer, the value is put into the "memory" HashMap with the first parameter being the variable's name and the second being its value.
When println is called on an atom, it should display the atom's value. The atom can be either a string or integer. It gets its value from memory and returns it. So for example:
defun greeting "Hello world!"
println greeting
But when I run this code, I get this error:
line 3:8 no viable alternative at input 'greeting'
null
NOTE: This output comes when I do:
println "greeting"
Output:
undefined variable "greeting"null
Does anyone know why this is so? Sorry if I'm not being clear, I don't understand most of this.
defun greeting "Hello world!"
println greeting
But when I run this code, I get this error:
line 3:8 no viable alternative at input 'greeting'
Because the input "greeting" is being tokenized as a VAR and a VAR is no atom. So the input defun greeting "Hello world!" is properly matched by the 2nd alternative of the defun rule:
defun
: 'defun' VAR INT // 1st alternative
| 'defun' VAR STRING_LITERAL // 2nd alternative
;
but the input println "greeting" cannot be matched by the println rule:
println
: 'println' atom
;
You must realize that the lexer does not produce tokens based on what the parser tries to match at a particular time. The input "greeting" will always be tokenized as a VAR, never as an ID rule.
What you need to do is remove the ID rule from the lexer, and replace ID with VAR inside your parser rules.

How can I build an ANTLR Works style parse tree?

I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123

ANTLR Grammar gives me upsidedown-tree

I have a grammar which parses dot notion expressions like this:
a.b.c
memberExpression returns [Expression value]
: i=ID { $value = ParameterExpression($i.value); }
('.' m=memberExpression { $value = MemberExpression($m.value, $i.value); }
)*
;
This parses expressions fine and gives me a tree structure like this:
MemberExpression(
MemberExpression(
ParameterExpression("c"),
"b"
)
, "a"
)
But my problem is that I want a tree that looks like this:
MemberExpression(
MemberExpression(
ParameterExpression("a"),
"b"
)
, "c"
)
for the same expression "a.b.c"
How can I achieve this?
You could do this by collecting all tokens in a java.util.List using ANTLR's convenience += operator and create the desired tree using a custom method in your #parser::members section:
// grammar def ...
// options ...
#parser::members {
private Expression customTree(List tks) {
// `tks` is a java.util.List containing `CommonToken` objects
}
}
// parser ...
memberExpression returns [Expression value]
: ids+=ID ('.' ids+=ID)* { $value = customTree($ids); }
;
I think what you are asking for is mutually left recursive, and therefore ANTLR is not a good choice to parse it.
To elaborate, you need C at the root of the tree and therefore your rule would be:
rule: rule ID;
This rule will be uncertain whether it should match
a.b
or
a.b.c