antlr add syntactic predicate - antlr

For the following rule :
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* switchDefaultLabel? switchCaseLabel*)
;
I got an error:"rule switchBlockLabels has non-LL descision due to recursive rule invocations reachable from alts 1,2".And I tried to add syntactic predicate to solve this problem.I read the book "The Definitive ANTLR Reference".And Now I am confused that since there is no alternatives in rule switchBlockLabels,then no decision need to be made on which one to choose.
Is anyone can help me?

Whenever the tree parser stumbles upon, say, 2 switchCaseLabels (and no switchDefaultLabel in the middle), it does not know to which these switchCaseLabels belong. There are 3 possibilities the parser can choose from:
2 switchCaseLabels are matched by the 1st switchCaseLabel*;
2 switchCaseLabels are matched by the 2nd switchCaseLabel*;
1 switchCaseLabel is matched by the 1st switchCaseLabel*, and one by the 2nd switchCaseLabel*.
and since the parser does not like to choose for you, it emits an error.
You need to do something like this instead:
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* (switchDefaultLabel switchCaseLabel*)?)
;
That way, when there are only switchCaseLabels, and no switchDefaultLabel, these switchCaseLabels would be always matched by the first switchCaseLabel*: there is no ambiguity anymore.

Related

how do I resolve this antlr ambiguity?

I have a 4000 line text file which is parsing slowly, taking perhaps 3 minutes. I am running the Intellij Antlr plugin. When I look at the profiler, I see this:
The time being consumed is the largest of all rules, by a factor of 15 or so. That's ok, the file is full of things I actually don't care about (hence 'trash'). However, the profiler says words_and_trash is ambiguous but I don't know why. Here are the productions in question. (There are many others of course...):
I have no idea why this is ambiguous. The parser isn't complaining about so_much_trash and I don't think word, trash, and OPEN_PAREN overlap.
What's my strategy for solving this ambiguity?
It's ambiguous because, given your two alternatives for words_and_trash, anything that matches the first alternative, could also match the second alternative (that's the definition ambiguity in this context).
It appears you might be using a technique common in other grammar tools to handle repetition. ANTLR can do this like so:
words_and_trash: so_much_trash+;
so_much_trash: word
| trash
| OPEN_PAREN words_and_trash CLOSE_PAREN
;
You might also find the following video, useful: ANTLR4 Intellij Plugin -- Parser Preview, Parse Tree, and Profiling. It's by the author of ANTLR, and covers ambiguities.

ANTLR recognize single character

I'm pretty sure this isn't possible, but I want to ask just in case.
I have the common ID token definition:
ID: LETTER (LETTER | DIG)*;
The problem is that in the grammar I need to parse, there are some instructions in which you have a single character as operand, like:
a + 4
but
ab + 4
is not possible.
So I can't write a rule like:
sum: (INT | LETTER) ('+' (INT | LETTER))*
Because the lexer will consider 'a' as an ID, due to the higher priority of ID. (And I can't change that priority because it wouldn't recognize single character IDs then)
So I can only use ID instead of LETTER in that rule. It's ugly because there shouldn't be an ID, just a single letter, and I will have to do a second syntactic analysis to check that.
I know that there's nothing to do about it, since the lexer doesn't understand about context. What I'm thinking that maybe there's already built-in ANTLR4 is some kind of way to check the token's length inside the rule. Something like:
sum: (INT | ID{length=1})...
I would also like to know if there are some kind of "token alias" so I can do:
SINGLE_CHAR is alias of => ID
In order to avoid writing "ID" in the rule, since that can be confusing.
PD: I'm not parsing a simple language like this one, this is just a little example. In reality, an ID could also be a string, there are other tokens which can only be a subset of letters, etc... So I think I will have to do that second analysis anyways after parsing the entry to check that syntactically is legal. I'm just curious if something like this exists.
Checking the size of an identifier is a semantic problem and should hence be handled in the semantic phase, which usually follows the parsing step. Parse your input with the usual ID rule and check in the constructed parse tree the size of the recognized ids (and act accordingly). Don't try to force this kind of decision into your grammar.

how to skip "and" with skip rule?

I'm working on a new antlr grammar which is similar to nattys and should recognize date expressions, but I have problem with skip rules. In more detail I want to ignore useless "and"s in expressions for example:
Call Sam, John and Adam and fix a meeting with Sarah about the finance on Monday and Friday.
The first two "and"s are useless. I wrote the rule bellow to fix this problem but it didn't work, why? what should I do?
NW : [~WeekDay];
UselessAnd : AND NW -> skip;
"Useless AND" is a semantic concept.
Grammars are about syntax, and handle semantic issues poorly. Don't couple these together.
Suggestion: when you write a grammar for a language, make your parser accept the language as it is, warts and all. In your case, I suggest you "collect" the useless ANDs. That way you can get the grammar "right" more easily, and more transparently to the next coder who has to maintain your grammar.
Once you have the AST, it is pretty easy to ignore (semantically) useless things; if nothing else, you can post-process the AST and remove the useless AND nodes.

Mutually left-recursive?

I'm working on a parser for a grammar in ANTLR. I'm currently working on expressions where () has the highest order precedence, then Unary Minus, etc.
When I add the line ANTLR gives the error: The following sets of rules are mutually left-recursive [add, mul, unary, not, and, expr, paren, accessMem, relation, or, assign, equal] How can I go about solving this issue? Thanks in advance.
Easiest answer is to use antlr4 not 3, which has no problem with immediate left recursion. It automatically rewrites the grammar underneath the covers to do the right thing. There are plenty of examples. one could for example examine the Java grammar or my little blog entry on left recursive rules. If you are stuck with v3, then there are earlier versions of the Java grammar and also plenty of material on how to build arithmetic expression rules in the documentation and book.

String matching problem (can I prioritize?)

I have a (badly specified) requirement that I recognize certain keywords, but there is also provision for 'any string' ...
For instance, in the input "let's have a " I have to handle == "beer", == "curry" and == anything else at all (in theory, the keywords beer & curry have priority over all other strings).
When I try to define this, of course, I get
Decision can match input such as "'curry" using multiple alternatives: 2, 3
As a result, alternative(s) 3 were disabled for that input
I imagine this is a st00pid n00b FAQ, but don't see an obvious answer. Any help gratefully received ...
You need to apply some of the grammar disambiguation techniques that you are either learning (if this is homework). Generally speaking you add an additional rule that disambiguates the grammar. Another antlr specific thing you can do is add an action to the rule that will handle the differences.I might be able to help more if you post the antlr code in question.