I just ran into a strange problem with ANTLR 4.2.2:
Consider a (simplified) java grammar. This does not compile:
classOrInterfaceType
: (classOrInterfaceType) '.' Identifier
| Identifier
;
ANTLR outputs the following error:
error(119): Java.g4::: The following sets of rules are mutually left-recursive [classOrInterfaceType]
Yes, I also see a left recursion. But I do not see a mutual left recursion, only a usual one.
When I remove the parenthesis around (classOrInterfaceType), then it compiles fine. Of course, the parenthesis are superfluous, but the grammar is generated automatically and the code generator always inserts parenthesis in some situations. So what is the problem here?
It has been confirmed that this is a bug. The fix is scheduled for the next milestone 4.x.
See https://github.com/antlr/antlr4/issues/564
Related
I have a 4000 line text file which is parsing slowly, taking perhaps 3 minutes. I am running the Intellij Antlr plugin. When I look at the profiler, I see this:
The time being consumed is the largest of all rules, by a factor of 15 or so. That's ok, the file is full of things I actually don't care about (hence 'trash'). However, the profiler says words_and_trash is ambiguous but I don't know why. Here are the productions in question. (There are many others of course...):
I have no idea why this is ambiguous. The parser isn't complaining about so_much_trash and I don't think word, trash, and OPEN_PAREN overlap.
What's my strategy for solving this ambiguity?
It's ambiguous because, given your two alternatives for words_and_trash, anything that matches the first alternative, could also match the second alternative (that's the definition ambiguity in this context).
It appears you might be using a technique common in other grammar tools to handle repetition. ANTLR can do this like so:
words_and_trash: so_much_trash+;
so_much_trash: word
| trash
| OPEN_PAREN words_and_trash CLOSE_PAREN
;
You might also find the following video, useful: ANTLR4 Intellij Plugin -- Parser Preview, Parse Tree, and Profiling. It's by the author of ANTLR, and covers ambiguities.
If I have a grammar where a certain expression can match two productions, I will obviously have a reduce/reduce conflict with yacc. Specifically, say I have two productions (FirstProduction and SecondProduction) where both of them could be TOKEN END.
Then yacc will not be able to know what to reduce TOKEN END to (FirstProduction or SecondProduction). However, I want to make it so that yacc prioritises FirstProduction in this situation. How can I achieve that?
Note that both FirstProduction and SecondProduction could be a great deal of things and that Body is the only place in the grammar where these conflict.
Also, I do know that in these situations, yacc will choose the first production that was declared in the grammar. However, I want to avoid having any reduce/reduce warnings.
You can refactor the grammar to not allow the second list to start with something that could be part of the first list:
Body: FirstProductionList SecondProductionList
| FirstProductionList
;
FirstProductionList: FirstProductionList FirstProduction
| /* empty */
;
SecondProductionList: SecondProductionList SecondProduction
| NonFirstProduction
;
NonFirstProduction is any production that is unique to SecondProduction, and marks the transition from reducing FirstProdutions to SecondProductions
Bison has no way to explicitly mark one production as preferred over another; the only such mechanism is precedence relations, which resolve shift/reduce conflicts. As you say, the file order provides an implicit priority. You can suppress the warning with an %expect declaration; unfortunately, that only lets you tell bison how many conflicts to expect, and not which conflicts.
I'm working on a parser for a grammar in ANTLR. I'm currently working on expressions where () has the highest order precedence, then Unary Minus, etc.
When I add the line ANTLR gives the error: The following sets of rules are mutually left-recursive [add, mul, unary, not, and, expr, paren, accessMem, relation, or, assign, equal] How can I go about solving this issue? Thanks in advance.
Easiest answer is to use antlr4 not 3, which has no problem with immediate left recursion. It automatically rewrites the grammar underneath the covers to do the right thing. There are plenty of examples. one could for example examine the Java grammar or my little blog entry on left recursive rules. If you are stuck with v3, then there are earlier versions of the Java grammar and also plenty of material on how to build arithmetic expression rules in the documentation and book.
I am aware what implicit token definition error in parser means, but am having difficulty getting rid of it. (v4)
stripped down statements:
enum_decl : GTYPE_ENUM ID LSQUARE STRING STRING* RSQUARE SEMI ;
string_decl: GTYPE_STRING ID (COMMA ID)* SEMI ;
In string_decl, that error appears on SEMI
In enum_decl the same error is on RSQUARE
GTYPE_ENUM, ID, etc. all are defined / accepted correctly, in the Lexer section.
Have you type in that little tiny section trying to find a small test case that doesn't work? Without a grammar to test there's nothing we can do. Is either a bug or a problem with your grammar.
I have a simple grammar that works for the most part, but at one place it reports error and I think it shouldn't, because it can be resolved using backtracking.
Here is the portion that is problematic.
command: object message_chain;
object: ID;
message_chain: unary_message_chain keyword_message?
| binary_message_chain keyword_message?
| keyword_message;
unary_message_chain: unary_message+;
binary_message_chain: binary_message+;
unary_message: ID;
binary_message: BINARY_OPERATOR object;
keyword_message: (ID ':' object)+;
This is simplified version, object is more complex (it can be result of other command, raw value and so on, but that part works fine). Problem is in message_chain, in first alternative. For input like obj unary1 unary2 it works fine, but for intput like obj unary1 unary2 keyword1:obj2 is trys to match keyword1 as unary message and fails when it reaches :. I would think that it this situation parser would backtrack and figure that there is : and recognize that that is keyword message.
If I make keyword message non-optional it works fine, but I need keyword message to be optional.
Parser finds keyword message if it is in second alternative (binary_message) and third alternative (just keyword_message). So something like this gives good results: 1 + 2 + 3 Keyword1:Value
What am I missing? Backtracking is set to true in options and it works fine in other cases in the same grammar.
Thanks.
This is not really a case for PEG-style backtracking, because upon failure that returns to decision points in uncompleted derivations only. For input obj unary1 unary2 keyword1:obj2, with a single token lookahead, keyword1 could be consumed by unary_message_chain. The failure may not occur before keyword_message, and next to be tried would be the second alternative of message_chain, i.e. binary_message_chain, thus missing the correct parse.
However as this grammar is LL(2), it should be possible to extend lookahead to avoid consuming keyword1 from within unary_message_chain. Have you tried explicitly setting k=2, without backtracking?