Ambiguous grammar (using Bison) - grammar

I've got a problem with an ambiguous grammar. I've got this:
%token identifier
%token lolcakes
%start program
%%
program
: call_or_definitions;
expression
: identifier
| lolcakes;
expressions
: expression
| expressions ',' expression;
call_or_definition
: function_call
| function_definition;
call_or_definitions
: call_or_definition
| call_or_definitions call_or_definition;
function_argument_core
: identifier
| identifier '=' expression
| identifier '=' '{' expressions '}';
function_call
: expression '(' function_arguments ')' ';';
function_definition
: identifier '(' function_definition_arguments ')' '{' '}';
function_argument
: lolcakes
| function_argument_core;
function_arguments
: function_argument
| function_arguments ',' function_argument
function_definition_argument
: expression function_argument_core
| function_argument_core;
function_definition_arguments
: function_definition_argument
| function_definition_arguments ',' function_definition_argument;
It's a subset of my genuine grammar which is separately compilable. At the moment, it generates an S/R conflict between function_call and function_definition when encountering the stream identifier (. I'm trying to convince Bison that it doesn't need to make the decision until later in the token stream by unifying the grammar for function calls and function definitions. In other words, if it encounters something that's common to both calls and definitions, it can reduce that without needing to know which is which, and if it encounters something else, that something else would clearly label which is which. Is that even possible, and if so, how can I do it? I'd really rather avoid having to alter the structure of the input stream if possible.

The problem should not arise until you see identifier ( identifier with a lookahead of , or ). At that point the parser has to decide whether to reduce the second identifier as a function_definition_argument or an expression (to become function_argument).
You can solve this purely in the grammar by brute force, but it will lead you into a maze of nonterminals like expression_not_naked_identifier and ambiguous_begining_of_function_defn_or_call, with resulting rampant duplication of semantic actions.
It would probably be more maintainable (and lead to more intelligible syntax error messages) to write something like
definition_or_call_start: identifier '(' generic_argument_list ')'
generic_argument_list: generic_argument
| generic_argument_list ',' generic_argument
generic_argument: expression
| function_argument_core
| ...
function_call: definition_or_call_start ';';
function_definition : definition_or_call_start '{' '}';
and then check as a semantic constraint in the action for the last two productions that the actual generic_arguments you have parsed match the use they're being put to.

The problem is that an expression can consist of a single identifier. At this time the parser needs to decide whether it's a identifier only or if it shall reduce it to expression, since that will decide on the path afterwards.

Related

How to disambiguate a subselect from a parenthesized expression?

I have the following expression notation:
expr
: OpenParen expr (Comma expr)* Comma? CloseParen # parenExpr
| OpenParen simpleSelect CloseParen # subSelectExpr
Unfortunately, a simpleSelect can also have a parenthetical around it, and so the following statement becomes ambiguous:
select ((select 1))
Here is the current grammar that I have, simplified down to only showing the issue:
grammar Subselect;
options { caseInsensitive=true; }
statement: query_statement EOF;
query_statement
: query_expr # simple
| query_statement set_op query_statement # set
;
query_expr
: with_clause?
( select | '(' query_statement ')' )
limit_clause?
;
select
: select_clause
(from_clause
where_clause?)?
;
with_clause: 'WITH' expr 'AS (' select ')';
select_clause: 'SELECT' expr (',' expr)*;
from_clause: 'FROM' expr;
where_clause: 'WHERE' expr;
limit_clause: 'LIMIT' expr;
set_op: 'UNION'|'INTERSECT'|'EXCEPT';
expr
: '(' expr ')' # parenExpr
| '(' query_expr ')' # subSelect
| Atom # identifier
;
Atom: [a-z_0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
And on the parse of select ((select 1)), here is the output:
What would be a possible way to disambiguate this?
I suppose the main thing is here:
'(' query_statement ')'
Since that recursively calls itself -- is there a way to do indirection or something else such that a query_statement called from within parens can never itself have parens?
Also, maybe this is a common thing? I get the same ambiguous output when running this on the official MySQL grammar here:
I would be curious whether any of the grammars can solve the issue here: https://github.com/antlr/grammars-v4/tree/master/sql. Maybe the best approach is just to remove duplicate parens before parsing the text? (If so, are there are good tools to do that, or do I need to write an additional antlr parser just to do that?)
Your input generates this parse tree:
That's a reasonable interpretation of your input and it is identified as a subSelect expr. It's a subSelect nested in a parenExpr (both of which are exprs).
If I switch up your rule a bit:
expr: '(' query_expr ')' # subSelect
| '(' expr ')' # parenExpr
| Atom # identifier
;
Now it's a subSelect that interprets the nested (select 1) as a query expression.
It's ambiguous because the outer parenthesized expression could match either of the first two alternatives resulting in different parse trees.
In ANTLR, ambiguities in alternatives are resolved by "using" the first alternative that matches. In this way ANTLR has deterministic behavior where you can control which interpretation is used (with alternative order). It's not uncommon for ANTLR grammars to have ambiguities like this.
IMHO, the IntelliJ plugin has caused many people to stumble over this as an indication that something is "wrong" with the grammar. There's a reason that ANTLR itself does not report an error in this situation. It has defined, deterministic behavior.
So far as "resolving" this ambiguity: the simple fact that the syntax uses parentheses to indicate two different "things" indicates that it is inherently ambiguous, so I don't believe you can "fix" the grammar ambiguity without modifying the syntax. (I might be wrong about this, and would find it interesting if someone provides a refactoring that manages to remove the ambiguity.)
EDIT:
After trying an earlier solution that proved incorrect with some additional test data, I've tried a different approach.
I added Atom as a viable alternative for query_expr since that Atom '1` is being offered as test data. In the full grammar implementation, it's hard to predict if this is necessary, even sufficent. I have only the grammar above with which to test.
I used some semantic predicates to strip parentheses (avoids the effort of writing an additional parser).
For testing purposes only, I added SQL-style line comments so that I could test many different inputs quickly.
The following SQL statements were tested, showing no ambiguity.
select 1
select (1)
select ((select 1))
select ((select (abc)))
select abc from ((select 1 from (select((select(1))))))
(select 1 from (select((select(1)))))
((select (xyz) from (select (((((foo))))) from tableX)))
select a from (select x from xyz)
union
select b from abc
select a from ((select x from xyz ))
intersect
((select b from foo))
select a from (select x from xyz )
intersect
(select b from foo)
The grammar is as follows:
grammar Subselect;
options { caseInsensitive=true; }
#header
{
import java.util.*;
}
#parser::members
{
String stripParens(String phrase)
{
String temp1 = phrase.substring[1];
temp2 = temp1.substring(0, s.length()-1);
return temp2;
}
}
statement: query_statement EOF;
query_statement
: query_expr # simple
| query_statement set_op query_statement # set
;
query_expr
: with_clause?
( select | '(' query_statement ')' )
limit_clause?
| Atom
;
select
: select_clause
(from_clause
where_clause?)?
;
with_clause: 'WITH' expr 'AS (' select ')';
select_clause: 'SELECT' expr (',' expr)*;
from_clause: 'FROM' expr;
where_clause: 'WHERE' expr;
limit_clause: 'LIMIT' expr;
set_op: 'UNION'|'INTERSECT'|'EXCEPT';
lrpExpr
: {stripParens(_input.LT[1].getText())}? query_expr
;
expr
: '(' lrpExpr ')' # parenExpr
| Atom # identifier
;
//---------------------------------------------
Atom: [a-z_0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
LineComment : '--' ~[\r\n]* -> skip ;
I'm not including images of parse trees in this edit to conserve space. However, from the inputs I tested, lrpExpr, being a separate rule, would give e.g. a Visitor class to evaluate what is inside the parentheses before moving further down the parse tree, so order of evaluation e.g. mathematical operator precedence could still be honored.
All still fast and with zero ambiguity.
I hope this suits your needs better.
Attribution: I used this answer as a starting point for the Java code for the semantic predicate.

Xtext Grammar Ambiguities (Backtracking is not working)

I am trying to use Xtext to design a simple language for operations on sets of numbers.
Here are some examples of strings in the language:
{2,1+6} (A set of numbers 2 and 7)
{1+3, 3+5} + {2..5} (A union of sets {4, 8} and {2, 3, 4, 5})
I am using the following grammar:
grammar org.example.Set with org.eclipse.xtext.common.Terminals
generate set "http://www.set.net/set"
SetAddition returns SetExpression:
SetMultiplication ({SetAddition.left=current} '+' right=SetMultiplication)*
;
SetMultiplication returns SetExpression:
SetPrimary ({SetMultiplication.left=current} ('*'|'\\') right=SetPrimary)*
;
SetPrimary returns SetExpression:
SetAtom | '(' SetAddition ')'
;
SetAtom returns SetExpression:
Set | Range
;
Set:
lhs = '{' (car=ArithmeticTerm (',' cdr+=ArithmeticTerm)*)? '}'
;
Range:
'{' lhs=ArithmeticTerm '.' '.' rhs=ArithmeticTerm '}'
;
ArithmeticTerm:
Addition //| Multiplication
;
Addition returns ArithmeticTerm:
Multiplication ({Addition.lhs=current} ('+'|'-') rhs=Multiplication)*
;
Multiplication returns ArithmeticTerm:
Primary ({Multiplication.lhs=current} ('*'|'/'|'%') rhs=Primary)*
;
Primary returns ArithmeticTerm:
ArithmeticAtom |
'(' Addition ')'
;
ArithmeticAtom:
value = INT
;
When I execute MWE2 workflow, I get the following error:
error(211): ../net.certware.argument.language.ui/src-gen/net/certware/argument/language/ui/contentassist/antlr/internal/InternalL.g:420:1: [fatal] rule rule__SetAtom__Alternatives has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
I do have backtracking enabled in mwe2 file.
I have this fragment of code in it:
// The antlr parser generator fragment.
fragment = parser.antlr.XtextAntlrGeneratorFragment auto-inject {
options = {
backtrack = true
}
}
And there is no other fragments that mention ANTLR in mwe2 file.
The version of Xtext I am using is Xtext 2.8.0 integrated in Full Eclipse available from Xtext Website.
Why does ANTLR suggest me to enable backtracking if it is already enabled?
Is there anything wrong with my grammar?
The error stems from your syntax for
SetPrimary returns SetExpression:
SetAtom | '(' SetAddition ')'
;
and
Primary returns ArithmeticTerm:
ArithmeticAtom |
'(' Addition ')'
;
which can be reduced to
SetPrimary returns SetExpression:
ArithmeticAtom | '(' Addition ')' | '(' SetAddition ')'
;
Since Addition and SetAddition are indistinguishable with finite lookahead (both can start with an infinite number of opening ( ). Therefore you need backtracking in the first place - you may want to reconsider the syntax or the AST structure.
Anyway, please add the backtracking also to the XtextAntlrUiGeneratorFragment in your workflow.

How to parse a parenthesized hierarchy root?

I'm trying to parse values with ANTLR. Here's the relevant part of my grammar:
root : IDENTIFIER | SELF | literal | constructor | call | indexer;
hierarchy : root (SUB^ (IDENTIFIER | call | indexer))*;
factor : hierarchy ((MULT^ | DIV^ | MODULO^) hierarchy)*;
sum : factor ((PLUS^ | MINUS^) factor)*;
comparison : sum (comparison_operator^ sum)*;
value : comparison | '(' value ')';
I won't describe each token or rule since their name is quite explanatory of their role. This grammar works well and compiles, allowing me to parse, using value, things such as:
a.b[c(5).d[3] * e()] < e("f")
The only thing left for value recognition is to be able to have parenthesized hierarchy roots. For instance:
(a.b).c
(3 < d()).e
...
Naively, and without much expectations, I tried adding the following alternative to my root rule:
root : ... | '(' value ')';
This however breaks the value rule due to non-LL(*)ism:
rule value has non-LL(*) decision due to recursive rule invocations reachable
from alts 1,2. Resolve by left-factoring or using syntactic predicates or using
backtrack=true option.
Even after reading most of The Definitive ANTLR Reference, I still don't understand these errors. However, what I do understand is that, upon seeing a parenthesis opening, ANTLR cannot know if it's looking at the beginning of a parenthesized value, or at the beginning of a parenthesized root.
How can I clearly define the behavior of parenthesized hierarchy root?
Edit: As requested, the additional rules:
parameter : type IDENTIFIER -> ^(PARAMETER ^(type IDENTIFIER));
constructor : NEW type PAREN_OPEN (arguments+=value (SEPARATOR arguments+=value)*)? PAREN_CLOSE -> ^(CONSTRUCTOR type ^(ARGUMENTS $arguments*)?);
call : IDENTIFIER PAREN_OPEN (values+=value (SEPARATOR values+=value)*)? PAREN_CLOSE -> ^(CALL IDENTIFIER ^(ARGUMENTS $values*)?);
indexer : IDENTIFIER INDEX_START (values+=value (SEPARATOR values+=value)*)? INDEX_END -> ^(INDEXER IDENTIFIER ^(ARGUMENTS $values*));
Remove '(' value ')' from value and place it in root:
root : IDENTIFIER | SELF | literal | constructor | call | indexer | '(' value ')';
...
value : comparison;
Now (a.b).c will result in the following parse:
And (3 < d()).e in:
Of course, you'll probably want to omit the parenthesis from the AST:
root : IDENTIFIER | SELF | literal | constructor | call | indexer | '('! value ')'!;
Also, you don't need to append tokens in a List using += in your parser rules. The following:
call
: IDENTIFIER PAREN_OPEN (values+=value (SEPARATOR values+=value)*)? PAREN_CLOSE
-> ^(CALL IDENTIFIER ^(ARGUMENTS $values*)?)
;
can be rewritten into:
call
: IDENTIFIER PAREN_OPEN (value (SEPARATOR value)*)? PAREN_CLOSE
-> ^(CALL IDENTIFIER ^(ARGUMENTS value*)?)
;
EDIT
Your main problem is the fact that certain input can be parsed in two (or more!) ways. For example, the input (a) could be parsed by alternative 1 and 2 of your value rule:
value
: comparison // alternative 1
| '(' value ')' // alternative 2
;
Run through your parser rules: a comparison (alternative 1) can match (a) because it matches the root rule, which in its turn matches '(' value ')'. But that is also what alternative 2 matches! And there you have it: the parser "sees" for one input, two different
parses and reports about this ambiguity.

How to resolve this parsing ambiguitiy in Antlr3

Hopefully this is just the right amount of information to help me solve this problem.
Given the following ANTLR3 syntax
grammar mygrammar;
program : statement* | function*;
function : ID '(' args ')' '->' statement+ (','statement+) '.' ;
args : arg (',' arg)*;
arg : ID ('->' expression)?;
statement : assignment
| number
| string
;
assignment : ID '->' expression;
string : UNICODE_STRING;
number : HEX_NUMBER | INTEGER ( '.' INTEGER )?;
// ================================================================
HEX_NUMBER : '0x' HEX_DIGIT+;
INTEGER : DIGIT+;
fragment
DIGIT : ('0'..'9');
Here is the line that is causing problems in the parser.
my_function(x, y, z -> 42) -> 10001.
ANTLRWorks highlights the last . after the 10001 in red as being a problem with the following error.
How can I make this stop throwing org.antlr.runtime.EarlyExitException?
I am sure this is because of some ambiguity between my number parser rule and trying to use the . as a EOL delimiter.
There is another ambiguity that also needs fixing. Change:
program : statement* | function*;
into:
program : (statement | function)*;
(although the 2 are not equivalent, I'm guessing you want the latter)
And in your function rule, you now defined there to be at least 2 statements:
function : ID '(' args ')' '->' statement (','statement)+ '.' ;
while I'm guessing you really want at least one:
function : ID '(' args ')' '->' statement (','statement)* '.' ;
Now, your real problem: since you're constructing floats in a parser rule, from the end of your input, 10001., the parser tries to construct a number of it, while you want it to match an INTEGER and then a ., as you yourself already said in your OP.
To fix this, you need to give the parser a bit of extra look-ahead to "see" beyond this ambiguity. Do that by adding the predicate (INTEGER '.' INTEGER)=> before actually matching said input:
number
: HEX_NUMBER
| (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER
| INTEGER
;
Now your input will generate the following parse tree:
Perhaps unrelated, but I'm curious none-the-less:
function : ID '(' args ')' '->' statement+ (','statement+) '.' ;
Should this instead be:
function : ID '(' args ')' '->' statement (',' statement)* '.' ;
I think the first one would require a single comma in a function definition but the second one would require a comma as a statement separator.
Also, does the rule for args allow z -> 42 correctly?

Left-factoring grammar of coffeescript expressions

I'm writing an Antlr/Xtext parser for coffeescript grammar. It's at the beginning yet, I just moved a subset of the original grammar, and I am stuck with expressions. It's the dreaded "rule expression has non-LL(*) decision" error. I found some related questions here, Help with left factoring a grammar to remove left recursion and ANTLR Grammar for expressions. I also tried How to remove global backtracking from your grammar, but it just demonstrates a very simple case which I cannot use in real life. The post about ANTLR Grammar Tip: LL() and Left Factoring gave me more insights, but I still can't get a handle.
My question is how to fix the following grammar (sorry, I couldn't simplify it and still keep the error). I guess the trouble maker is the term rule, so I'd appreciate a local fix to it, rather than changing the whole thing (I'm trying to stay close to the rules of the original grammar). Pointers are also welcome to tips how to "debug" this kind of erroneous grammar in your head.
grammar CoffeeScript;
options {
output=AST;
}
tokens {
AT_SIGIL; BOOL; BOUND_FUNC_ARROW; BY; CALL_END; CALL_START; CATCH; CLASS; COLON; COLON_SLASH; COMMA; COMPARE; COMPOUND_ASSIGN; DOT; DOT_DOT; DOUBLE_COLON; ELLIPSIS; ELSE; EQUAL; EXTENDS; FINALLY; FOR; FORIN; FOROF; FUNC_ARROW; FUNC_EXIST; HERECOMMENT; IDENTIFIER; IF; INDENT; INDEX_END; INDEX_PROTO; INDEX_SOAK; INDEX_START; JS; LBRACKET; LCURLY; LEADING_WHEN; LOGIC; LOOP; LPAREN; MATH; MINUS; MINUS; MINUS_MINUS; NEW; NUMBER; OUTDENT; OWN; PARAM_END; PARAM_START; PLUS; PLUS_PLUS; POST_IF; QUESTION; QUESTION_DOT; RBRACKET; RCURLY; REGEX; RELATION; RETURN; RPAREN; SHIFT; STATEMENT; STRING; SUPER; SWITCH; TERMINATOR; THEN; THIS; THROW; TRY; UNARY; UNTIL; WHEN; WHILE;
}
COMPARE : '<' | '==' | '>';
COMPOUND_ASSIGN : '+=' | '-=';
EQUAL : '=';
LOGIC : '&&' | '||';
LPAREN : '(';
MATH : '*' | '/';
MINUS : '-';
MINUS_MINUS : '--';
NEW : 'new';
NUMBER : ('0'..'9')+;
PLUS : '+';
PLUS_PLUS : '++';
QUESTION : '?';
RELATION : 'in' | 'of' | 'instanceof';
RPAREN : ')';
SHIFT : '<<' | '>>';
STRING : '"' (('a'..'z') | ' ')* '"';
TERMINATOR : '\n';
UNARY : '!' | '~' | NEW;
// Put it at the end, so keywords will be matched earlier
IDENTIFIER : ('a'..'z' | 'A'..'Z')+;
WS : (' ')+ {skip();} ;
root
: body
;
body
: line
;
line
: expression
;
assign
: assignable EQUAL expression
;
expression
: value
| assign
| operation
;
identifier
: IDENTIFIER
;
simpleAssignable
: identifier
;
assignable
: simpleAssignable
;
value
: assignable
| literal
| parenthetical
;
literal
: alphaNumeric
;
alphaNumeric
: NUMBER
| STRING;
parenthetical
: LPAREN body RPAREN
;
// term should be the same as expression except operation to avoid left-recursion
term
: value
| assign
;
questionOp
: term QUESTION?
;
mathOp
: questionOp (MATH questionOp)*
;
additiveOp
: mathOp ((PLUS | MINUS) mathOp)*
;
shiftOp
: additiveOp (SHIFT additiveOp)*
;
relationOp
: shiftOp (RELATION shiftOp)*
;
compareOp
: relationOp (COMPARE relationOp)*
;
logicOp
: compareOp (LOGIC compareOp)*
;
operation
: UNARY expression
| MINUS expression
| PLUS expression
| MINUS_MINUS simpleAssignable
| PLUS_PLUS simpleAssignable
| simpleAssignable PLUS_PLUS
| simpleAssignable MINUS_MINUS
| simpleAssignable COMPOUND_ASSIGN expression
| logicOp
;
UPDATE:
The final solution will use Xtext with an external lexer to avoid to intricacies of handling significant whitespace. Here is a snippet from my Xtext version:
CompareOp returns Operation:
AdditiveOp ({CompareOp.left=current} operator=COMPARE right=AdditiveOp)*;
My strategy is to make a working Antlr parser first without a usable AST. (Well, it would deserve a separates question if this is a feasible approach.) So I don't care about tokens at the moment, they are included to make development easier.
I am aware that the original grammar is LR. I don't know how close I can stay to it when transforming to LL.
UPDATE2 and SOLUTION:
I could simplify my problem with the insights gained from Bart's answer. Here is a working toy grammar to handle simple expressions with function calls to illustrate it. The comment before expression shows my insight.
grammar FunExp;
ID: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
NUMBER: '0'..'9'+;
WS: (' ')+ {skip();};
root
: expression
;
// atom and functionCall would go here,
// but they are reachable via operation -> term
// so they are omitted here
expression
: operation
;
atom
: NUMBER
| ID
;
functionCall
: ID '(' expression (',' expression)* ')'
;
operation
: multiOp
;
multiOp
: additiveOp (('*' | '/') additiveOp)*
;
additiveOp
: term (('+' | '-') term)*
;
term
: atom
| functionCall
| '(' expression ')'
;
When you generate a lexer and parser from your grammar, you see the following error printed to your console:
error(211): CoffeeScript.g:52:3: [fatal] rule expression has non-LL(*) decision due to recursive rule invocations reachable from alts 1,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
warning(200): CoffeeScript.g:52:3: Decision can match input such as "{NUMBER, STRING}" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
(I've emphasized the important bits)
This is only the first error, but you start with the first and with a bit of luck, the errors below that first one will also disappear when you fix the first one.
The error posted above means that when you're trying to parse either a NUMBER or a STRING with the parser generated from your grammar, the parser can go two ways when it ends up in the expression rule:
expression
: value // choice 1
| assign // choice 2
| operation // choice 3
;
Namely, choice 1 and choice 3 both can parse a NUMBER or a STRING, as you can see by the "paths" the parser can follow to match these 2 choices:
choice 1:
expression
value
literal
alphaNumeric : {NUMBER, STRING}
choice 3:
expression
operation
logicOp
relationOp
shiftOp
additiveOp
mathOp
questionOp
term
value
literal
alphaNumeric : {NUMBER, STRING}
In the last part of the warning, ANTLR informs you that it ignores choice 3 whenever either a NUMBER or a STRING will be parsed, causing choice 1 to match such input (since it is defined before choice 3).
So, either the CoffeeScript grammar is ambiguous in this respect (and handles this ambiguity somehow), or your implementation of it is wrong (I'm guessing the latter :)). You need to fix this ambiguity in your grammar: i.e. don't let the expression's choices 1 and 3 both match the same input.
I noticed 3 other things in your grammar:
1
Take the following lexer rules:
NEW : 'new';
...
UNARY : '!' | '~' | NEW;
Be aware that the token UNARY can never match the text 'new' since the token NEW is defined before it. If you want to let UNARY macth this, remove the NEW rule and do:
UNARY : '!' | '~' | 'new';
2
In may occasions, you're collecting multiple types of tokens in a single one, like LOGIC:
LOGIC : '&&' | '||';
and then you use that token in a parser rules like this:
logicOp
: compareOp (LOGIC compareOp)*
;
But if you're going to evaluate such an expression at a later stage, you don't know what this LOGIC token matched ('&&' or '||') and you'll have to inspect the token's inner text to find that out. You'd better do something like this (at least, if you're doing some sort of evaluating at a later stage):
AND : '&&';
OR : '||';
...
logicOp
: compareOp ( AND compareOp // easier to evaluate, you know it's an AND expression
| OR compareOp // easier to evaluate, you know it's an OR expression
)*
;
3
You're skipping white spaces (and no tabs?) with:
WS : (' ')+ {skip();} ;
but doesn't CoffeeScript indent it's code block with spaces (and tabs) just like Python? But perhaps you're going to do that in a later stage?
I just saw that the grammar you're looking at is a jison grammar (which is more or less a bison implementation in JavaScript). But bison, and therefor jison, generates LR parsers while ANTLR generates LL parsers. So trying to stay close to the rules of the original grammar will only result in more problems.