I was having difficulty figuring out what does ^ and ! stand for in ANTLR grammar terminology.
Have a look at the ANTLR Cheat Sheet:
! don't include in AST
^ make AST root node
And ^ can also be used in rewrite rules: ... -> ^( ... ). For example, the following two parser rules are equivalent:
expression
: A '+'^ A ';'!
;
and:
expression
: A '+' A ';' -> ^('+' A A)
;
Both create the following AST:
+
/ \
A A
In other words: the + is made as root, the two A's its children, and the ; is omitted from the tree.
Related
Why this simple grammar
grammar Test;
expr
: Int | expr '+' expr;
Int
: [0-9]+;
doesn't match the input 1+1 ? It says "No method for rule expr or it has arguments" but in my opition it should be matched.
It looks like I haven't used ANTLR for a while... ANTLRv3 did not support left-recursive rules, but ANTLRv4 does support immediate left recursion. It also supports the regex-like character class syntax you used in your post. I tested this version and it works in ANTLRWorks2 (running on ANTLR4):
grammar Test;
start : expr
;
expr : expr '+' expr
| INT
;
INT : [0-9]+
;
If you add the start rule then ANTLR is able to infer that EOF goes at the end of that rule. It doesn't seem to be able to infer EOF for more complex rules like expr and expr2 since they're recursive...
There are a lot of comments below, so here is (co-author of ANTLR4) Sam Harwell's response (emphasis added):
You still want to include an explicit EOF in the start rule. The problem the OP faced with using expr directly is ANTLR 4 internally rewrote it to be expr[int _p] (it does so for all left recursive rules), and the included TestRig is not able to directly execute rules with parameters. Adding a start rule resolves the problem because TestRig is able to execute that rule. :)
I've posted a follow-up question with regard to EOF: When is EOF needed in ANTLR 4?
If your command looks like this:
grun MYGRAMMAR xxx -tokens
And this exception is thrown:
No method for rule xxx or it has arguments
Then this exception will get thrown with the rule you specified in the command above. It means the rule probably doesn't exist.
System.err.println("No method for rule "+startRuleName+" or it has arguments");
So startRuleName here, should print xxx if it's not the first (start) rule in the grammar. Put xxx as the first rule in your grammar to prevent this.
I know what the caret postfix means in antlr(ie. make root) but what about when the caret is the prefix as in the following grammar I have been reading(this grammar is brand new and done by a new team learning antlr)....
selectClause
: SELECT resultList -> ^(SELECT_CLAUSE resultList)
;
fromClause
: FROM tableList -> ^(FROM_CLAUSE tableList)
;
Also, I know what => means but what about the -> ? What does -> imply?
thanks,
Dean
The ^ is used as an inline tree operator, indicating a certain token should become the root of the tree.
For example, the rule:
p : A B^ C;
creates the following AST:
B
/ \
A C
There's another way to create an AST which is using a rewrite rule. A rewrite rule is placed after (or at the right of) an alternative of a parser rule. You start a rewrite rule with an "arrow", ->, followed by the rules/tokens you want to be in the AST.
Take the previous rule:
p : A B C;
and you want to reverse the tokens, but keep the ASST "flat" (no root node). THis can be done using the following rewrite rule:
p : A B C -> C B A;
And if you want to create an AST similar to p : A B^ C;, you start your rewrite rule with ^( ... ) where the first token/rule inside the parenthesis will become the root node. So the rule:
p : A B C -> ^(B A C);
produces the same AST as p : A B^ C;.
Related:
Tree construction
How to output the AST built using ANTLR?
Say I have the following ANTLR rule:
ROOT: 'r' ('0'..'9')*;
CHILD: 'c' ('0'..'9')*;
expression: ROOT ('.'^ CHILD)*;
For input such as r.c1.c2.c3, ANTLR would make the following tree:
.(.(.(r c1) c2) c3)
How can I represent the parent property of '.' without the ^ operator directly, i.e., in a rewrite rule?
expression: ROOT ('.' CHILD)* -> ?
The trick is to invoke the expression rule recursively in the rewrite rule (the $expression part below):
expression : (ROOT -> ROOT) ('.' CHILD -> ^('.' $expression CHILD))*;
which is equivalent to:
expression: ROOT ('.'^ CHILD)*;
Yeah, I know, it's not pretty, there is no simple syntax like you (may have) hoped for:
expression: ROOT ('.' CHILD)* -> ^(...);
See: Parr's Definitive ANTLR Reference, chapter 7, paragraph "Referencing Previous Rule ASTs in Rewrite Rules", page 174.
I use ANTLRWorks for a simple grammar:
grammar boolean;
// [...]
lowercase_string
: ('a'..'z')+ ;
However, the lowercase_string doesn't match foobar according to the Interpreter (MismatchedSetException(10!={}). Ideas?
You can't use the .. operator inside parser rules like that. To match the range 'a' to 'z', create a lexer rule for it (lexer rules start with a capital).
Try it like this:
lowercase_string
: Lower+
;
Lower
: 'a'..'z'
;
or:
lowercase_string
: Lower
;
Lower
: 'a'..'z'+
;
Also see this previous Q&A: Practical difference between parser rules and lexer rules in ANTLR?
grammar AdifyMapReducePredicate;
PREDICATE
: PREDICATE_BRANCH
| EXPRESSION
;
PREDICATE_BRANCH
: '(' PREDICATE (('&&' PREDICATE)+ | ('||' PREDICATE)+) ')'
;
EXPRESSION
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
Trying to interpret this in ANTLRWorks 1.4 and getting the following error:
[12:18:21] error(211): <notsaved>:1:8: [fatal] rule Tokens has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
[12:18:21] Interpreting...
When I interepret, I'm trying to interpret a PREDICATE and my test case is (A||B)
What am I missing?
By ANTLR's conventions, parser rule names start with a lower case letter, while lexer rules start with capital letters. So the grammar, as you wrote it, has three lexer rules, defining tokens. This may not be what you want.
The reason for the error message apparently is the ambiguity between these tokens: your input pattern matches the definitions of both PREDICATE and PREDICATE_BRANCH.
Just use names starting in lower case letters instead of PREDICATE and PREDICATE_BRANCH. You may also have to add an extra rule for the target symbol, that is not directly involved in the recursion.
By the way, this grammar is recursive, but not left-recursive, and when using parser rules, it definitely is LL(1).
You don't have a parser rule (parser rules start with a lower case letter), although I'm not sure that last part is necessary when interpreting some test cases in ANTLRWorks.
Anyway, try something like this:
grammar AdifyMapReducePredicate;
parse
: (p=predicate {System.out.println("parsed :: "+$p.text);})+ EOF
;
predicate
: expression
;
expression
: booleanExpression
;
booleanExpression
: atom ((AND | OR) atom)*
;
atom
: ID
| '(' predicate ')'
;
AND
: '&&'
;
OR
: '||'
;
ID
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
SPACE
: (' ' | '\t' | '\r' | '\n') {skip();}
;
With the following test class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("(A || B) (C && (D || F || G))");
AdifyMapReducePredicateLexer lexer = new AdifyMapReducePredicateLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AdifyMapReducePredicateParser parser = new AdifyMapReducePredicateParser(tokens);
parser.parse();
}
}
which after generating a lexer & parser (a), compiling all .java files (b) and running the test class (c), produces the following output:
parsed :: (A||B)
parsed :: (C&&(D||F||G))
a
java -cp antlr-3.2.jar org.antlr.Tool AdifyMapReducePredicate.g
b
javac -cp antlr-3.2.jar *.java
c (*nix/MacOS)
java -cp .:antlr-3.2.jar Main
c (Windows)
java -cp .;antlr-3.2.jar Main
HTH