ANTLR Grammar gives me upsidedown-tree - antlr

I have a grammar which parses dot notion expressions like this:
a.b.c
memberExpression returns [Expression value]
: i=ID { $value = ParameterExpression($i.value); }
('.' m=memberExpression { $value = MemberExpression($m.value, $i.value); }
)*
;
This parses expressions fine and gives me a tree structure like this:
MemberExpression(
MemberExpression(
ParameterExpression("c"),
"b"
)
, "a"
)
But my problem is that I want a tree that looks like this:
MemberExpression(
MemberExpression(
ParameterExpression("a"),
"b"
)
, "c"
)
for the same expression "a.b.c"
How can I achieve this?

You could do this by collecting all tokens in a java.util.List using ANTLR's convenience += operator and create the desired tree using a custom method in your #parser::members section:
// grammar def ...
// options ...
#parser::members {
private Expression customTree(List tks) {
// `tks` is a java.util.List containing `CommonToken` objects
}
}
// parser ...
memberExpression returns [Expression value]
: ids+=ID ('.' ids+=ID)* { $value = customTree($ids); }
;

I think what you are asking for is mutually left recursive, and therefore ANTLR is not a good choice to parse it.
To elaborate, you need C at the root of the tree and therefore your rule would be:
rule: rule ID;
This rule will be uncertain whether it should match
a.b
or
a.b.c

Related

antlr rule boolean parameter showing up in syntactic predicate code one level higher, causing compilation errors

I have a grammar that can parse expressions like 1+2-4 or 1+x-y, creating an appropriate structure on the fly which later, given a Map<String, Integer> with appropriate content, can be evaluated numerically (after parsing is complete, i.e. for x or y only known later).
Inside the grammar, there are also places where an expression that can be evaluated on the spot, i.e. does not contain variables, should occur. I figured I could parse these with the same logic, adding a boolean parameter variablesAllowed to the rule, like so:
grammar MiniExprParser;
INT : ('0'..'9')+;
ID : ('a'..'z'| 'A'..'Z')('a'..'z'| 'A'..'Z'| '0'..'9')*;
PLUS : '+';
MINUS : '-';
numexpr returns [Double val]:
expr[false] {$val = /* some evaluation code */ 0.;};
varexpr /*...*/:
expr[true] {/*...*/};
expr[boolean varsAllowed] /*...*/:
e=atomNode[varsAllowed] {/*...*/}
(PLUS e2=atomNode[varsAllowed] {/*...*/}
|MINUS e2=atomNode[varsAllowed] {/*...*/}
)* ;
atomNode[boolean varsAllowed] /*...*/:
(n=INT {/*...*/})
|{varsAllowed}?=> ID {/*...*/}
;
result:
(numexpr) => numexpr {System.out.println("Numeric result: " + $numexpr.val);}
|varexpr {System.out.println("Variable expression: " + $varexpr.text);};
However, the generated Java code does not compile. In the part apparently responsible for the final rule's syntactic predicate, varsAllowed occurs even although the variable is never defined at this level.
/* ... */
else if ( (LA3_0==ID) && ((varsAllowed))) {
int LA3_2 = input.LA(2);
if ( ((synpred1_MiniExprParser()&&(varsAllowed))) ) {
alt3=1;
}
else if ( ((varsAllowed)) ) {
alt3=2;
}
/* ... */
Am I using it wrong? (I am using Eclipse' AntlrIDE 2.1.2 with Antlr 3.5.2.)
This problem is part of the hoisting process the parser uses for prediction. I encountered the same problem and ended up with a member var (or static var for the C target) instead of a parameter.

How do I rewrite a subtree with a composite root using ANTLR?

I have an antlr grammer with subtrees like this:
^(type ID)
that I want to convert to:
^(type DUMMY ID)
where type is 'a'|'b'.
Note: what I really want to do is convert anonymous instantiations to explicit by generating dummy names.
I've narrowed it down to the grammars below, but I'm getting this:
(a bar) (b bar)
got td
got bu
Exception in thread "main" org.antlr.runtime.tree.RewriteEmptyStreamException: rule type
at org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleElementStream.java:157)
at org.antlr.runtime.tree.RewriteRuleSubtreeStream.nextNode(RewriteRuleSubtreeStream.java:77)
at Pattern.bu(Pattern.java:382)
The error message continues. My debug so far:
The input made it through the initial grammar generating two trees. a bar and b bar.
The second grammar does match the trees. it's printing td and bu.
The rewrite crashes, but I have no idea why? What does RewriteEmptyStreamException mean.
What the proper way to do this kind of a rewrite?
My main grammer Rewrite.g:
grammar Rewrite;
options {
output=AST;
}
#members{
public static void main(String[] args) throws Exception {
RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.test().getTree();
System.out.println(tree.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
Pattern p = new Pattern(nodes);
CommonTree newtree = (CommonTree) p.downup(tree);
}
}
type
: 'a'
| 'b'
;
test : id+;
id : type ID -> ^(type ID["bar"]);
DUMMY : 'dummy';
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
and Pattern.g
tree grammar Pattern;
options {
tokenVocab = Rewrite;
ASTLabelType=CommonTree;
output=AST;
filter=true; // tree pattern matching mode
}
topdown
: td
;
bottomup
: bu
;
type
: 'a'
| 'b'
;
td
: ^(type ID) { System.out.println("got td"); }
;
bu
: ^(type ID) { System.out.println("got bu"); }
-> ^(type DUMMY ID)
;
to do compile:
java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Rewrite.g
java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Pattern.g
javac -cp ../jar/antlr-3.4-complete-no-antlrv2.jar *.java
java -classpath .:../jar/antlr-3.4-complete-no-antlrv2.jar RewriteParser
EDIT 1: I have also tried using antlr4 and I get the same crash.
There are two small problems to address to get the rewrite to work, one problem in Rewrite and the other in Pattern.
The Rewrite grammar produces ^(type ID) as root elements in the output AST, as shown in the output (a bar) (b bar). A root element can't be transformed because transforming is actually a form of child-swapping: the element's parent drops the element and replaces it with the new, "transformed" version. Without a parent, you'll get the error Can't set single child to a list. Adding the root is a matter of creating an imaginary token ROOT or whatever name you like and referencing it in your entry-level rule's AST generation like so: test : id+ -> ^(ROOT id+);.
The Pattern grammar, the one producing the error you're getting, is confused by the type rule: type : 'a' | 'b' ; as part of the rewrite. I don't know the low-level details here, but apparently a tree parser doesn't maintain the state of a visited root rule like type in ^(type ID) when writing a transform (or maybe it can't or shouldn't, or maybe it's some other limitation). The easiest way to address this is with the following two changes:
Let text "a" and "b" match rule ID in the lexer by changing rule type in Rewrite from type: 'a' | 'b'; to just type: ID;.
Let rule bu in Pattern match against ^(ID ID) and transform to ^(ID DUMMY ID).
Now with a couple of minor debugging changes to Rewrite's main, input "a foo\nb bar" produces the following output:
(ROOT (a foo) (b bar))
got td
got bu
(a foo) -> (a DUMMY foo)
got td
got bu
(b bar) -> (b DUMMY bar)
(ROOT (a DUMMY foo) (b DUMMY bar))
Here are the files as I've changed them:
Rewrite.g
grammar Rewrite;
options {
output=AST;
}
tokens {
ROOT;
}
#members{
public static void main(String[] args) throws Exception {
RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.test().getTree();
System.out.println(tree.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
Pattern p = new Pattern(nodes);
CommonTree newtree = (CommonTree) p.downup(tree, true); //print the transitions to help debugging
System.out.println(newtree.toStringTree()); //print the final result
}
}
type : ID;
test : id+ -> ^(ROOT id+);
id : type ID -> ^(type ID);
DUMMY : 'dummy';
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
Pattern.g
tree grammar Pattern;
options {
tokenVocab = Rewrite;
ASTLabelType=CommonTree;
output=AST;
filter=true; // tree pattern matching mode
}
topdown
: td
;
bottomup
: bu
;
td
: ^(ID ID) { System.out.println("got td"); }
;
bu
: ^(ID ID) { System.out.println("got bu"); }
-> ^(ID DUMMY ID)
;
I don't have much experience with tree-patterns, with or without rewrites. But when using rewrite rules in them, I believe your options should also include rewrite=true;. The Definitive ANTLR Reference doesn't handle them, so I'm not entirely sure (have a look at the ANTLR wiki for more info).
However, for such (relatively) simple rewrites, you don't really need a separate grammar. You could make DUMMY an imaginary token and inject it in some other parser rule, like this:
grammar T;
options {
output=AST;
}
tokens {
DUMMY;
}
test : id+;
id : type ID -> ^(type DUMMY["dummy"] ID);
type
: 'a'
| 'b'
;
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
which would parse the input:
a bar
b foo
into the following AST:
Note that if your lexer is also meant to tokenize the input "dummy" as a DUMMY token, change the tokens { ... } block into this:
tokens {
DUMMY='dummy';
}
and you'd still be able to inject a DUMMY in other rules.

Null Pointer Exception - ANTLR TreeWalker

I keep getting a NullPoiterException in my TreeWalker but I can't seem to find out why.
I can't post the whole grammar, cause it's far too long.
This is the rule in the treeWalker where antlrWorks says the problem is:
collection_name returns [MyType value]
: ID { $value = (MyType) database.get($collection_name.text); }
;
Note that database is a HashMap.
Thank you!
I can't post the whole grammar, cause it's far too long.
The following is more "readable" and does exactly the same as your original rule:
collection_name returns [MyType value]
: ID { $value = (MyType) database.get($ID.text); }
;
Perhaps do some sanity checks:
collection_name returns [MyType value]
: ID
{
Object v = database.get($ID.text);
if(v == null) {
throw new RuntimeException($ID.text + " unknown in database!");
}
$value = (MyType) v;
}
;
EDIT
As you already found out, accessing the .text attribute of a rule is not possible in a tree grammar (only in a parser grammar). In tree grammars, every rule is of type Tree and knows a .start and .end attributes instead. Tokens can be accessed the same in both parser- and tree-grammars. So $ID.text works okay.

ANTLR Variable Troubles

In short: how do I implement dynamic variables in ANTLR?
I come to you again with a basic ANTLR question.
I have this grammar:
grammar Amethyst;
options {
language = Java;
}
#header {
package org.omer.amethyst.generated;
import java.util.HashMap;
}
#lexer::header {
package org.omer.amethyst.generated;
}
#members {
HashMap memory = new HashMap();
}
begin: expr;
expr: (defun | println)*
;
println:
'println' atom {System.out.println($atom.value);}
;
defun:
'defun' VAR INT {memory.put($VAR.text, Integer.parseInt($INT.text));}
| 'defun' VAR STRING_LITERAL {memory.put($VAR.text, $STRING_LITERAL.text);}
;
atom returns [Object value]:
INT {$value = Integer.parseInt($INT.text);}
| ID
{
Object v = memory.get($ID.text);
if (v != null) $value = v;
else System.err.println("undefined variable " + $ID.text);
}
| STRING_LITERAL
{
String v = (String) memory.get($STRING_LITERAL.text);
if (v != null) $value = String.valueOf(v);
else System.err.println("undefined variable " + $STRING_LITERAL.text);
}
;
INT: '0'..'9'+ ;
STRING_LITERAL: '"' .* '"';
VAR: ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')* ;
ID: ('a'..'z'|'A'..'Z'|'0'..'9')+ ;
LETTER: ('a..z'|'A'..'Z')+ ;
WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;
What it does (or should do), so far, is have a built-in "println" function to do exactly what you think it does, and a "defun" rule to define variables.
When "defun" is called on either a string or integer, the value is put into the "memory" HashMap with the first parameter being the variable's name and the second being its value.
When println is called on an atom, it should display the atom's value. The atom can be either a string or integer. It gets its value from memory and returns it. So for example:
defun greeting "Hello world!"
println greeting
But when I run this code, I get this error:
line 3:8 no viable alternative at input 'greeting'
null
NOTE: This output comes when I do:
println "greeting"
Output:
undefined variable "greeting"null
Does anyone know why this is so? Sorry if I'm not being clear, I don't understand most of this.
defun greeting "Hello world!"
println greeting
But when I run this code, I get this error:
line 3:8 no viable alternative at input 'greeting'
Because the input "greeting" is being tokenized as a VAR and a VAR is no atom. So the input defun greeting "Hello world!" is properly matched by the 2nd alternative of the defun rule:
defun
: 'defun' VAR INT // 1st alternative
| 'defun' VAR STRING_LITERAL // 2nd alternative
;
but the input println "greeting" cannot be matched by the println rule:
println
: 'println' atom
;
You must realize that the lexer does not produce tokens based on what the parser tries to match at a particular time. The input "greeting" will always be tokenized as a VAR, never as an ID rule.
What you need to do is remove the ID rule from the lexer, and replace ID with VAR inside your parser rules.

How can I build an ANTLR Works style parse tree?

I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123