left recursion with xtext - grammar

I'm having a big problem with xtext and I don't really know how to solve it, so there's a small part of the grammar I'm working with:
typename:
IDENTIFIER=IDENTIFIER | qualified_ident=qualified_ident
;
qualified_ident:
packagename "." IDENTIFIER
;
packagename:
IDENTIFIER
;
terminal IDENTIFIER:
LETTER (LETTER | DECIMAL_DIGIT)*
;
terminal LETTER:
'a' .. 'z' | 'A' .. 'Z' | "_"
;
terminal DECIMAL_DIGIT:
'0' .. '9'
;
And there's the error I get on Eclipse:
error(211): ../org.xtext.example.mydsl/src-
gen/org/xtext/example/mydsl/parser/antlr/internal/InternalMyDsl.g:7253:2:
[fatal] rule ruletypename has non-LL(*) decision due to recursive rule
invocations reachable from alts 1,2. Resolve by left-factoring or using
syntactic predicates or using backtrack=true option.
It says the grammar has left recursion but I can't see it and I don't know how to fix this. I'm having problems like this on the entire grammar but I believe if someone explain to me how to solve that one, I can figure out the rest.
Update: you can see the entire grammar here

you can change the workflow as follows
language = StandardLanguage {
name = "org.xtext.example.mydsl2.MyDsl"
...
parserGenerator = {
debugGrammar = true
}
}
and then use ANTLRworks version 3x to find out where the left recursion is

Related

Did "!", "^" and "$" had a special meaning in Antlr3?

I dont have any prior knowledge about ANTLR(I recently learned a little bit about ANTLR4), but I have to translate an old grammar to a newer version and eclipse is telling me, that their are no viable alternatives for those characters and shows the syntax error " '!' came as a complete surprise to me".
I already deleted those characters and it does not seam to be a problem, but maybe it had a special function in ANTLR3.
Thanks in advance.
global_block:
DATABASE! IDENTIFIER!
| GLOBALS! define_section!+ END! GLOBALS!
| GLOBALS! STRING!
;
main_block: MAIN sequence? END em=MAIN
-> ^(MAIN MAIN '(' ')' sequence? $em)
;
^ and -> are related to tree rewriting: https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687090/Tree+construction
ANTLR4 does not support it (v4 has listeners and visitors for tree traversal, but no rewriting anymore). Just remove all of these ! and -> ... in parser rules (do not remove the -> ... inside lexer rules like -> channel(...), which is still supported in v4).
So in your case, these rules would be valid in ANTLR4:
global_block
: DATABASE IDENTIFIER
| GLOBALS define_section+ END GLOBALS
| GLOBALS STRING
;
main_block
: MAIN sequence? END MAIN
;
The $ can still be used in ANTLR4: they are used to reference sub-rules or tokens:
expression
: lhs=expression operator=(PLUS | MINUS) rhs=expression
| NUMBER
;
so that in embedded code block, you can do: $lhs.someField.someMethod(). In your case, you can also just remove them because they are probably only used in the tree rewrite rules.
EDIT
kaby76 has a Github page with some instructions for converting grammars to ANTLR4: https://github.com/kaby76/AntlrVSIX/blob/master/doc/Import.md#antlr3

Antlr: Unintended behavior

Why this simple grammar
grammar Test;
expr
: Int | expr '+' expr;
Int
: [0-9]+;
doesn't match the input 1+1 ? It says "No method for rule expr or it has arguments" but in my opition it should be matched.
It looks like I haven't used ANTLR for a while... ANTLRv3 did not support left-recursive rules, but ANTLRv4 does support immediate left recursion. It also supports the regex-like character class syntax you used in your post. I tested this version and it works in ANTLRWorks2 (running on ANTLR4):
grammar Test;
start : expr
;
expr : expr '+' expr
| INT
;
INT : [0-9]+
;
If you add the start rule then ANTLR is able to infer that EOF goes at the end of that rule. It doesn't seem to be able to infer EOF for more complex rules like expr and expr2 since they're recursive...
There are a lot of comments below, so here is (co-author of ANTLR4) Sam Harwell's response (emphasis added):
You still want to include an explicit EOF in the start rule. The problem the OP faced with using expr directly is ANTLR 4 internally rewrote it to be expr[int _p] (it does so for all left recursive rules), and the included TestRig is not able to directly execute rules with parameters. Adding a start rule resolves the problem because TestRig is able to execute that rule. :)
I've posted a follow-up question with regard to EOF: When is EOF needed in ANTLR 4?
If your command looks like this:
grun MYGRAMMAR xxx -tokens
And this exception is thrown:
No method for rule xxx or it has arguments
Then this exception will get thrown with the rule you specified in the command above. It means the rule probably doesn't exist.
System.err.println("No method for rule "+startRuleName+" or it has arguments");
So startRuleName here, should print xxx if it's not the first (start) rule in the grammar. Put xxx as the first rule in your grammar to prevent this.

ANTLR error(99) grammar has no rules

I previously posted about my first attempt at using ANTLR when I was having issues with left recursion.
Now that I have resolved those issues, I am getting the following error when I try to use org.antlr.v4.Tool to generate the code:
error(99): C:test.g4::: grammar 'test' has no rules
What are the possible reasons for this error? Using ANTLRWorks I can certainly see rules in the Parse Tree so why can't it see them? Is it because it cannot find a suitable START rule?
I think Antlr expects the first rule name to be in small case. I was getting the same error with my grammar
grammar ee;
Condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
By changing the first production rule in the grammar to lower case it solved the issue i.e. the below grammar worked for me.
grammar ee;
condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
Note: It is my personal interpretation, I couldn't find this reasoning in the online documentation.
Edit: The production rules should begin with lower case letters as specified in the latest docs [1]
[1] https://github.com/antlr/antlr4/blob/master/doc/lexicon.md#identifiers
I'm not sure if you've found the solution for this, but I had the same problem and fixed it by changing my start symbol to 'prog'. So for example, the first two lines of your .g4 file would be:
grammar test;
prog : <...> ;
Where <...> will be your first derivation.
I just got that error as well (antlworks 2.1).
switching from
RULE : THIS | THAT ; to rule : this | that ; for parser rules (i.e. from uppercase to lowercase) solved the problem!
EDIT
The above correction holds only for RULE , what follows after the : can be any combination of lexer/parser rules
The most likely cause is just what the error message suggests. And the most likely reason for that is that you have not saved your grammar to the file--or if you're using ANTLRWorks2--ANTLRWorks hasn't saved your work to the file. I have no idea why ANTLRWorks doesn't save reliably.
I also got the same error but could not fix it.
I downloaded antlrworks-1.4.jar and it's working perfectly.
Download >> antlrworks-1.4.jar
Changing the first rule to start with a lower case character worked for me.
I had the same problem, and this means that your grammar has no Syntactic rules. So in order to avoid this error, you need to write at least one Syntactic rule.

Antlr 3 keywords and identifiers colliding

Surprise, I am building an SQL like language parser for a project.
I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.
The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:
line 1:15 mismatched character u'v' expecting 'c'
GRAMMAR
grammar Select;
options {
language=Python;
}
eval returns [value]
: field EOF
;
field returns [value]
: fieldsegments {print $field.text}
;
fieldsegments
: fieldsegment (DOT (fieldsegment))*
;
fieldsegment
: ICHAR+ (USCORE ICHAR+)*
;
WS : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};
ICHAR : ('a'..'z'|'A'..'Z');
PCT_CONTAINS : 'pct_contains';
USCORE : '_';
DOT : '.';
I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.
Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.
What am I missing or doing completely wrong?
Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like
"abc
_ abc"
which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.

ANTLR: Difference between backtrack and look-ahead?

I relative new to ANTLR. I have an very easy grammar:
start :
('A' 'B' 'C' '1'
|'A' 'B' 'C' '2'
|'A' 'B' 'C' '3'
)
;
I think that I already understand the basics of the concept of look ahead and backtracking (which works with syntactic predicates). So this grammar works with k=4 or with backtrack=true. But what is the exact difference and the main question is when do I use what? i tried to find the answer on the internet but didn't succeded.
Your grammar works in ANTLR v3 without any options.
The k option limits ANTLR to classical LL(k) parsing. Backtracking means - if the parser cannot predict, which rule to use, it just tries, backtracks and tries again.
The backtracking option you should use when ANTLR cannot build look-ahead DFA for the given grammar. ANTLR v3 can build DFAs from regular expressions pretty easy, but it has its difficulties with recursive rules. For example, this grammar works:
start: recursive_rule ';'
| recursive_rule ':'
;
recursive_rule : (ID)* '%'
;
This grammar below is the same, but expressed through recursion. ANTLR cannot build DFA for it (I actually don’t know why), so you need to switch backtracking on:
start options {backtrack=true;} : recursive_rule ';'
| recursive_rule ':'
;
recursive_rule : ID recursive_rule
|'%'
;
The k option is used to improve the parser performance. I don’t know any other reason for restricting LL(*) to LL(k).
I found a theoretical description to my question in the Book "The definitve Antlr Reference", which was also important for my understanding. Maybe some others who asks themselves similar question would help this snippet of the book too.
Page 262