caret prefix instead of postfix in antlr - antlr

I know what the caret postfix means in antlr(ie. make root) but what about when the caret is the prefix as in the following grammar I have been reading(this grammar is brand new and done by a new team learning antlr)....
selectClause
: SELECT resultList -> ^(SELECT_CLAUSE resultList)
;
fromClause
: FROM tableList -> ^(FROM_CLAUSE tableList)
;
Also, I know what => means but what about the -> ? What does -> imply?
thanks,
Dean

The ^ is used as an inline tree operator, indicating a certain token should become the root of the tree.
For example, the rule:
p : A B^ C;
creates the following AST:
B
/ \
A C
There's another way to create an AST which is using a rewrite rule. A rewrite rule is placed after (or at the right of) an alternative of a parser rule. You start a rewrite rule with an "arrow", ->, followed by the rules/tokens you want to be in the AST.
Take the previous rule:
p : A B C;
and you want to reverse the tokens, but keep the ASST "flat" (no root node). THis can be done using the following rewrite rule:
p : A B C -> C B A;
And if you want to create an AST similar to p : A B^ C;, you start your rewrite rule with ^( ... ) where the first token/rule inside the parenthesis will become the root node. So the rule:
p : A B C -> ^(B A C);
produces the same AST as p : A B^ C;.
Related:
Tree construction
How to output the AST built using ANTLR?

Related

Formal Languages - Grammar

I am taking a Formal Languages and Computability class and am having a little trouble understanding the concept of grammar. One of my assignment questions is this:
Take ∑ = {a,b}, and let na(w) and nb(w) denote the number of a's and b's in the string w, respectively. Then the grammar G with productions:
S -> SS
S -> λ
S -> aSb
S -> bSa
generates the language L = {w: na(w) = nb(w)}.
1) The language in the example contains an empty string. Modify the given grammar so that it generates L - {λ}.
I am thinking that I should modify the condition of L, something like:
L = {w: na(w) = nb(w), na, nb > 0}
That way, we indicate that the string is never empty.
2) Modify the grammar in the example so that it will generate L ∪ {anbn+1: n >= 0}.
I am not sure on how to do this one. Should that mean I make one more condition in the grammar, adding something like S -> aSbb?
Any explanation about these two questions would be greatly appreciated. I'm still trying to figure these grammar stuff out so I am not sure about my answers.
1) The question is about modifying the grammar to obtain a new language; so don't modify directly the language…
Your grammar generates the empty word because of the production:
S -> λ
So you could think of removing this production altogether. This yields the following grammar:
S -> SS
S -> aSb
S -> bSa
Unfortunately, this grammar doesn't generate a language (a bit like in induction, it misses an initial: there are no productions that only consist of terminals). To fix this, add the following productions:
S -> ab
S -> ba
2) Don't randomly try to add production rules in the hope that it's going to work. Here you want a's followed by b's. So the production rule
S -> bSa
must certainly disappear. Also, the rule
S -> SS
would produce, e.g., abab (try to see how this is obtained). So we'll have to remove it too. We're left with:
S -> λ
S -> aSb
Now this grammar generates:
λ
ab
aabb
aaabbb
etc. That's not bad at all! To get an extra trailing b, we could create a new non-terminal, say T, replace our current S by T, and add that trailing b in S:
T -> λ
T -> aTb
S -> Tb
I know that this is homework; I gave you the solutions to your homework: that's because, from the way you asked your question, it seems you're completely lost. I hope this answer will help you get on the right path!

complex AST rewrite rule in ANTLR

After the problem about AST rewrite rule with devide group technique at AST rewrite rule with " * +" in antlr.
I have a trouble with AST generating in ANTLR, again :).Here is my antlr code :
start : noun1+=n (prep noun2+=n (COMMA noun3+=n)*)*
-> ^(NOUN $noun1) (^(PREP prep) ^(NOUN $noun2) ^(NOUN $noun3)*)*
;
n : 'noun1'|'noun2'|'noun3'|'noun4'|'noun5';
prep : 'and'|'in';
COMMA : ',';
Now, with input : "noun1 and noun2, noun3 in noun4, noun5", i got following unexpected AST:
Compare with the "Parse Tree" in ANLRwork:
I think the $noun3 variable holding the list of all "n" in "COMMA noun3+=n". Consequently, AST parser ^(NOUN $noun3)* will draw all "n" without sperating which "n" actually belongs to the "prep"s.
Are there any way that can make the sepration in "(^(PREP prep) ^(NOUN $noun2) ^(NOUN $noun3))". All I want to do is AST must draw exactly, without token COMMA, with "Parse Tree" in ANTLRwork.
Thanks for help !
Getting the separation that you want is easiest if you break up the start rule. Here's an example (without writing COMMAs to the AST):
start : prepphrase //one prepphrase is required.
(COMMA! prepphrase)* //"COMMA!" means "match a COMMA but don't write it to the AST"
;
prepphrase: noun1=n //You can use "noun1=n" instead of "noun1+=n" when you're only using it to store one value
(prep noun2=n)?
-> ^(NOUN $noun1) ^(PREP prep)? ^(NOUN $noun2)?
;
A prepphrase is a noun that may be followed by a preposition with another noun. The start rule looks for comma-separated prepphrases.
The output appears like the parse tree image, but without the commas.
If you prefer explicitly writing out ASTs with -> or if you don't like syntax like COMMA!, you can write the start rule like this instead. The two different forms are functionally equivalent.
start : prepphrase //one prepphrase is required.
(COMMA prepphrase)*
-> prepphrase+ //write each prepphrase, which doesn't include commas
;

How to represent multiple parents as rewrite rule?

Say I have the following ANTLR rule:
ROOT: 'r' ('0'..'9')*;
CHILD: 'c' ('0'..'9')*;
expression: ROOT ('.'^ CHILD)*;
For input such as r.c1.c2.c3, ANTLR would make the following tree:
.(.(.(r c1) c2) c3)
How can I represent the parent property of '.' without the ^ operator directly, i.e., in a rewrite rule?
expression: ROOT ('.' CHILD)* -> ?
The trick is to invoke the expression rule recursively in the rewrite rule (the $expression part below):
expression : (ROOT -> ROOT) ('.' CHILD -> ^('.' $expression CHILD))*;
which is equivalent to:
expression: ROOT ('.'^ CHILD)*;
Yeah, I know, it's not pretty, there is no simple syntax like you (may have) hoped for:
expression: ROOT ('.' CHILD)* -> ^(...);
See: Parr's Definitive ANTLR Reference, chapter 7, paragraph "Referencing Previous Rule ASTs in Rewrite Rules", page 174.

What does ^ and ! stand for in ANTLR grammar

I was having difficulty figuring out what does ^ and ! stand for in ANTLR grammar terminology.
Have a look at the ANTLR Cheat Sheet:
! don't include in AST
^ make AST root node
And ^ can also be used in rewrite rules: ... -> ^( ... ). For example, the following two parser rules are equivalent:
expression
: A '+'^ A ';'!
;
and:
expression
: A '+' A ';' -> ^('+' A A)
;
Both create the following AST:
+
/ \
A A
In other words: the + is made as root, the two A's its children, and the ; is omitted from the tree.

ANTLR - Implicit AND Tokens In Tree

I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:
cheese and crackers = cheese AND crackers
(up and down) or (left and right) = (up AND down) OR (left AND right)
cat dog “potbelly pig” = cat AND dog AND “potbelly pig”
I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:
grammar SearchEngine;
options { language = CSharp2; output = AST; }
#lexer::namespace { Demo.SearchEngine }
#parser::namespace { Demo.SearchEngine }
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE : ('"');
fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C');
WS : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (CHARACTER)+;
startExpression : andExpression;
andExpression : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
Since your AND-rule has the optional AND-keyword, you should create an imaginary AND-token and use a rewrite-rule to "inject" that token in your tree. In this case, you can't make use of ANTLR's short-hand ^ root-operator. You'll have to use the -> rewrite operator.
Your andExpression should look like:
andExpression
: (andnotExpression -> andnotExpression)
(AND? a=andnotExpression -> ^(AndNode $andExpression $a))*
;
A detailed description of this (perhaps cryptic) notation is given in Chapter 7, section Rewrite Rules in Subrules, page 173-174 of The Definitive ANTLR Reference by Terence Parr.
I ran a quick test to see if the grammar produces the proper AST with the new andExpression rule. After parsing the string cat dog "potbelly and pig" and FOO, the generated parser produced the following AST:
alt text http://img580.imageshack.us/img580/7370/andtree.png
Note that the AndNode and Root are imaginary tokens.
If you want to know how to create the AST picture above, see this thread: Visualizing an AST created with ANTLR (in a .Net environment)
EDIT
When parsing both one two three and (one two) three, the following AST is created:
alt text http://img203.imageshack.us/img203/2558/69551879.png
And when parsing (one two) OR three, the following AST is created:
alt text http://img340.imageshack.us/img340/8779/73390353.png
which seems to be the proper way in all cases.