enterDecision(int) in the type DebugEventListener is not applicable for the arguments (int, boolean)? - antlr

I am using ANTLR 3.1.3 to generate the parser. After importing the generated testParser, I found there is several errors like
try { dbg.enterDecision(2, decisionCanBacktrack[2]);
The method enterDecision(int) in the type DebugEventListener is not applicable for the arguments (int, boolean) testParser.java /ANTLRTest/src line 280 Java Problem
If I changed to dbg.enterDecision(2), then everything is fine.
The grammar is as follows,
grammar Test;
options {output=AST;}
expr : mexpr (PLUS^ mexpr)* SEMI! ;
mexpr : atom (STAR^ atom)* ;
atom: INT ;
//class csharpTestLexer extends Lexer;
WS : (' ' | '\t' | '\n' | '\r') { $channel = HIDDEN; } ;
LPAREN: '(' ;
RPAREN: ')' ;
STAR: '*' ;
PLUS: '+' ;
SEMI: ';' ;
DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
I am using ANTLRWorks 1.4.3 to generate lexer and parser.
JDK 1.6
Any reason to this error?

It looks like you've generated a lexer and parser with an ANTLR version that is different than the one you added to Eclipse's classpath.
If you generate a lexer and/or parser with ANTLRWorks 1.4.3 (which contains ANTLR 3.4), you should also add ANTLR 3.4 to your project's build path in Eclipse and remove ANTLR 3.1.3 from it.
BTW, I get error for 2 + 2 * 3, do you know why? anything wrong with the above grammar.
That is because single digit numbers are being tokenized as DIGIT tokens. Either make DIGIT a fragment:
fragment DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
or remove it:
INT : '0'..'9'+ ;
See: What does "fragment" mean in ANTLR?


ANTLR4 grammar rule that cannot be reached from start rule affects language

The following minimal grammar shows the issue:
grammar test;
call : exp LP exp RP ;
exp : exp LP exp RP | ID;
ID : [a-z] ;
LP : '(' ;
RP : ')' ;
Newline : '\r\n' | '\n' ;
If I use call as the start rule, then the generated parser will gladly parse the following input:
(tried it in ANTLR lab, which probably uses the java target, and locally using a C++ target with ANTLR 4.9.3).
If I now add the following rule to the grammar, but keep call as the start rule, then the same input does not match call anymore.
callWithNewline : call Newline;
Why does callWithNewline affect whether call matches?
If I change the input to have a newline character after it will suddenly match call in ANTLR lab (even though the newline is not part of the match of course), but not in the C++ target, so the targets have slightly different behavior here.
I ran into this behavior while unit testing subrules, it does appear that parsing a full grammar which contains this kind of subgrammar somewhere lower in the hierarchy does not lead to issues.
The issue still occurs if I remove the ambiguity
grammar test;
callWithNewline : call Newline ;
call : exp LP ID RP ;
exp : exp LP ID RP | ID;
ID : [a-z] ;
LP : '(' ;
RP : ')' ;
Newline : '\r\n' | '\n' ;

ANTLR proper ordering of grammar rules

I am trying to write a grammar that will recognize <<word>> as a special token but treat <word> as just a regular literal.
Here is my grammar:
grammar test;
doc: item+ ;
item: func | atom ;
func: '<<' WORD '>>' ;
atom: PUNCT+ #punctAtom
| NEWLINE+ #newlineAtom
| WORD #wordAtom
WS : [ \t] -> skip ;
NEWLINE : [\n\r]+ ;
PUNCT : [.,?!]+ ;
fragment CHAR : (LETTER | DIGIT | SYMB | PUNCT) ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
fragment SYMB : ~[a-zA-Z0-9.,?! |{}\n\r\t] ;
So something like <<word>> will be matched by two rules, both func and atom. I want it to be recognized as a func, so I put the func rule first.
When I test my grammar with <word> it treats it as an atom, as expected. However when I test my grammar and give it <<word>> it treats it as an atom as well.
Is there something I'm missing?
PS - I have separated atom into PUNCT, NEWLINE, and WORD and given them labels #punctAtom, #newlineAtom, and #wordAtom because I want to treat each of those differently when I traverse the parse tree. Also, a WORD can contain PUNCT because, for instance, someone can write "Hello," and I want to treat that as a single word (for simplicity later on).
PPS - One thing I've tried is I've included < and > in the last rule, which is a list of symbols that I'm "disallowing" to exist inside a WORD. This solves one problem, in that <<word>> is now recognized as a func, but it creates a new problem because <word> is no longer accepted as an atom.
ANTLR's lexer tries to match as much characters as possible, so both <<WORD>> and <WORD> are matched by the lexer rul WORD. Therefor, there in these cases the tokens << and >> (or < and > for that matter) will not be created.
You can see what tokens are being created by running these lines of code:
Lexer lexer = new testLexer(CharStreams.fromString("<word> <<word>>"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s %s\n", testLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
which will print:
WORD <word>
WORD <<word>>
What you could do is something like this:
: '<<' WORD '>>'
: PUNCT+ #punctAtom
| NEWLINE+ #newlineAtom
| word #wordAtom
| '<' WORD '>'
fragment SYMB : ~[<>a-zA-Z0-9.,?! |{}\n\r\t] ;
Of course, something like foo<bar will not become a single WORD, which it previously would.

Trouble migrating antlr grammar

I have never used antlr in past, but now have to migrate grammar for an older version to the latest. I am trying to generate lexer and parser for c# target. I am stuck on migrating the start rule seen below.
grammar expr;
DQUOTE: '\"';
SQUOTE: '\'';
NEG : '-';
PLUS : '+';
OPEN : '(';
CLOSE : ')';
PERIOD: '.';
COMMA : ',';
start returns [Expression value]
expression EOF { $value = $expression.value; }
expression returns [Expression value]
literal { $value = $literal.value; }
| name { $value = $name.value; }
| functionCall { $value = $functionCall.value; }
I get the following error.
syntax error:
mismatched input '[Expression value]' expecting ARG_ACTION while
matching a rule.
I have already come across a post Troubles with returns declaration on the first parser rule in an ANTLR4 grammar. But Sam's response has not helped me figure out what I should be changing in my case.
I would appreciate if anyone could let me know the equivalent of the start rule in latest grammar.
The answer you linked appears to be applicable to your case. Move lexer rules (i.e. those starting with uppercase letters, DQUOTE and so on) after parser rules like start.

ANTLR4 Negative lookahead workaround?

I'm using antlr4 and I'm trying to make a parser for Matlab. One of the main issue there is the fact that comments and transpose both use single quotes. What I was thinking of a solution was to define the STRING lexer rule in somewhat the following manner:
(if previous token is not ')','}',']' or [a-zA-Z0-9]) than match '\'' ( ESC_SEQ | ~('\\'|'\''|'\r'|'\n') )* '\'' (but note I do not want to consume the previous token if it is true).
Does anyone knows a workaround this problem, as it does not support negative lookaheads?
You can do negative lookahead in ANTLR4 using _input.LA(-1) (in Java, see how to resolve simple ambiguity or ANTLR4 negative lookahead in lexer).
You can also use lexer mode to deal with this kind of stuff, but your lexer had to be defined in its own file. The idea is to go from a state that can match some tokens to another that can match new ones.
Here is an example from ANTLR4 lexer documentation:
// Default "mode": Everything OUTSIDE of a tag
COMMENT : '<!--' .*? '-->' ;
CDATA : '<![CDATA[' .*? ']]>' ;
OPEN : '<' -> pushMode(INSIDE) ;
XMLDeclOpen : '<?xml' S -> pushMode(INSIDE) ;
// ----------------- Everything INSIDE of a tag ------------------ ---
mode INSIDE;
CLOSE : '>' -> popMode ;
SPECIAL_CLOSE: '?>' -> popMode ; // close <?xml...?>
SLASH_CLOSE : '/>' -> popMode ;

ANTLR rule works on its own, but fails when included in another rule

I am trying to write an ANTLR grammar for a reparsed and retagged kconfig file (retagged to solve a couple of ambiguities). A simplified version of the grammar is:
grammar FailureExample;
options {
language = Java;
#lexer::header {
package parse.failure.example;
: configStatement*
: (type
| defConfigStatement
| dependsOnStatement
| helpStatement
| rangeStatement
| defaultStatement
| selectStatement
| visibleIfStatement
| prompt
type : FAKE1;
dependsOnStatement: FAKE2;
helpStatement: FAKE3;
rangeStatement: FAKE4;
defaultStatement: FAKE5;
: defConfigType expression
//expression parsing
| L_PAREN expression R_PAREN
: NOT* primative
: negationExpression (OR negationExpression)*
: orExpression (AND orExpression)*
: andExpression (NOT_EQUAL andExpression)?
: unequalExpression (EQUAL unequalExpression)?
: equalExpression (BECOMES equalExpression)?
DEF_BOOL: 'def_bool';
CONFIG : 'config';
COMMENT : '#' .* ('\n'|'\r') {$channel = HIDDEN;};
AND : '&&';
OR : '||';
NOT : '!';
L_PAREN : '(';
R_PAREN : ')';
BECOMES : '::=';
EQUAL : '=';
NOT_EQUAL : '!=';
FAKE1 : 'fake1';
FAKE2: 'fake2';
FAKE3: 'fake3';
FAKE4: 'fake4';
FAKE5: 'fake5';
FAKE6: 'fake6';
FAKE7: 'fake7';
FAKE8: 'fake8';
IDENT : (LETTER | DIGIT | '_')*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
With input:
def_bool n
I can set antlrworks to parse just the second line (commenting out the first) and I get the proper defConfigStatement token emitted with the proper expression following. However, if I exercise either the configOptions rule or the configStatement rule (with the first line uncommented), my configOptions results in an empty set and a NoViableAlt exception is thrown.
What would cause this behavior? I know that the defConfigStatement rule is accurate and can parse correctly, but as soon as it's added as a potential option in another rule, it fails. I know I don't have conflicting rules, and I've made DEF_BOOL and DEF_TRISTATE rules the top in my list of lexer rules, so they have priority over the other lexer rules.
/Added since edit/
To further complicate the issue, if I move the defConfigStatement choice in the configOptions rule, it will work, but other rules will fail.
Edit: Using full, simplified grammar.
In short, why does the rule work on its own, but fail when it's in configOptions (especially since configOptions is in (A | B | C)* form)?
When I parse the input:
def_bool n
with the parser generated from your grammar, I get the following parse tree:
So, I see no issues here. My guess is that you're using ANTLRWorks' interpreter: don't. It's buggy. Always test your grammar with a class of your own, or use ANTLWorks' debugger (press CTRL+D to launch is). The debugger works like a charm (without the package declaration, btw). The image I posted above is an export from the debugger.
If the debugger doesn't work, try (temporarily) removing the package declaration (note that you're only declaring a package for the lexer, not the parser, but that might be a caused by posting a minimal grammar). You could also try changing the port number the debugger should connect to. It could be the port is already in use (see: File -> Preferences -> Debugger-tab).