How to write Antlr rules in a non-recursive way? - antlr

I have the following the expressions I need to parse
and(true, false)
or(true, false, true)
not(or(true, false, true))
and(and(true, false), false)
or(or(true, false, true), true)
so far I have the following grammar
expr
: orExpr
;
orExpr
: OR '(' andExpr (',' andExpr)+ ')'
| andExpr
;
andExpr
: AND '(' equalityExpr (',' equalityExpr)+ ')'
| equalityExpr
;
equalityExpr
: comparison ((EQUALS | NOT_EQUALS) comparison)*
;
comparison
: notExpr ((GREATER_THAN_EQUALS | LESS_THAN_EQUALS | GREATER_THAN | LESS_THAN ) notExpr)?
;
notExpr
: NOT '(' expr ')'
| primary
;
primary
: '(' expr ')'
| true
| false
;
I am not able to parse and(and(true, false), false) with the above grammar? where am I going wrong?
Please assume there is precedence between AND OR NOT (although I understand it may look not necessary)

But for now, I need to be able to parse true=and(5=5, 4=4, 3>2) or vice-versa and(5=5, 4=4, 3>2)=true ?
In that case, there is absolutely no need to complicate things. All you have to do is this:
grammar Test;
parse
: expr EOF
;
expr
: '(' expr ')'
| OR '(' expr (',' expr)+ ')'
| AND '(' expr (',' expr)+ ')'
| NOT '(' expr ')'
| expr (LT | LTE | GT | GTE) expr
| expr (EQ | NEQ) expr
| TRUE
| FALSE
| INT
;
LT : '<';
LTE : '<=';
GT : '>';
GTE : '>=';
NEQ : '!=';
EQ : '=';
NOT : 'not';
TRUE : 'true';
FALSE : 'false';
AND : 'and';
OR : 'or';
INT
: [0-9]+
;
SPACES
: [ \t\r\n] -> skip
;

Related

ANTLR arithmetic and comparison expressions grammer ANTLR

how to add relational operations to my code
Thanks
My code is
grammar denem1;
options {
output=AST;
}
tokens {
ROOT;
}
parse
: stat+ EOF -> ^(ROOT stat+)
;
stat
: expr ';'
;
expr
: Id Assign expr -> ^(Assign Id expr)
| add
;
add
: mult (('+' | '-')^ mult)*
;
mult
: atom (('*' | '/')^ atom)*
;
atom
: Id
| Num
| '('! expr ')' !
;
Assign : '=' ;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
Like this:
...
expr
: Id Assign expr -> ^(Assign Id expr)
| rel
;
rel
: add (('<=' | '<' | '>=' | '>')^ add)?
;
add
: mult (('+' | '-')^ mult)*
;
...
If possible, use ANTLR v4 instead of the old v3. In v4, you can simply do this:
stat
: expr ';'
;
expr
: Id Assign expr
| '-' expr
| expr ('*' | '/') expr
| expr ('+' | '-') expr
| expr ('<=' | '<' | '>=' | '>') expr
| Id
| Num
| '(' expr ')'
;

grammar does not separate '123 and ] though the rule is set for it

I am new to antlr. I am trying to parse some queries like [network-traffic:src_port = '123] and [network-traffic:src_port =] and [network-traffic:src_port = ] and ... I have a grammar as follows:
grammar STIXPattern;
pattern
: observationExpressions EOF
;
observationExpressions
: <assoc=left> observationExpressions FOLLOWEDBY observationExpressions #observationExpressionsFollowedBY
| observationExpressionOr #observationExpressionOr_
;
observationExpressionOr
: <assoc=left> observationExpressionOr OR observationExpressionOr #observationExpressionOred
| observationExpressionAnd #observationExpressionAnd_
;
observationExpressionAnd
: <assoc=left> observationExpressionAnd AND observationExpressionAnd #observationExpressionAnded
| observationExpression #observationExpression_
;
observationExpression
: LBRACK comparisonExpression RBRACK # observationExpressionSimple
| LPAREN observationExpressions RPAREN # observationExpressionCompound
| observationExpression startStopQualifier # observationExpressionStartStop
| observationExpression withinQualifier # observationExpressionWithin
| observationExpression repeatedQualifier # observationExpressionRepeated
;
comparisonExpression
: <assoc=left> comparisonExpression OR comparisonExpression #comparisonExpressionOred
| comparisonExpressionAnd #comparisonExpressionAnd_
;
comparisonExpressionAnd
: <assoc=left> comparisonExpressionAnd AND comparisonExpressionAnd #comparisonExpressionAnded
| propTest #comparisonExpressionAndpropTest
;
propTest
: objectPath NOT? (EQ|NEQ) primitiveLiteral # propTestEqual
| objectPath NOT? (GT|LT|GE|LE) orderableLiteral # propTestOrder
| objectPath NOT? IN setLiteral # propTestSet
| objectPath NOT? LIKE StringLiteral # propTestLike
| objectPath NOT? MATCHES StringLiteral # propTestRegex
| objectPath NOT? ISSUBSET StringLiteral # propTestIsSubset
| objectPath NOT? ISSUPERSET StringLiteral # propTestIsSuperset
| LPAREN comparisonExpression RPAREN # propTestParen
| objectPath NOT? (EQ|NEQ) objectPathThl # propTestThlEqual
;
startStopQualifier
: START TimestampLiteral STOP TimestampLiteral
;
withinQualifier
: WITHIN (IntPosLiteral|FloatPosLiteral) SECONDS
;
repeatedQualifier
: REPEATS IntPosLiteral TIMES
;
objectPath
: objectType COLON firstPathComponent objectPathComponent?
;
objectPathThl
: varThlType DOT firstPathComponent objectPathComponent?
;
objectType
: IdentifierWithoutHyphen
| IdentifierWithHyphen
;
varThlType
: IdentifierWithoutHyphen
| IdentifierWithHyphen
;
firstPathComponent
: IdentifierWithoutHyphen
| StringLiteral
;
objectPathComponent
: <assoc=left> objectPathComponent objectPathComponent # pathStep
| '.' (IdentifierWithoutHyphen | StringLiteral) # keyPathStep
| LBRACK (IntPosLiteral|IntNegLiteral|ASTERISK) RBRACK # indexPathStep
;
setLiteral
: LPAREN RPAREN
| LPAREN primitiveLiteral (COMMA primitiveLiteral)* RPAREN
;
primitiveLiteral
: orderableLiteral
| BoolLiteral
| edgeCases
;
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK
| RBRACK
;
orderableLiteral
: IntPosLiteral
| IntNegLiteral
| FloatPosLiteral
| FloatNegLiteral
| StringLiteral
| BinaryLiteral
| HexLiteral
| TimestampLiteral
;
IntNegLiteral :
'-' ('0' | [1-9] [0-9]*)
;
IntNoSign :
('0' | [1-9] [0-9]*)
;
IntPosLiteral :
'+'? ('0' | [1-9] [0-9]*)
;
FloatNegLiteral :
'-' [0-9]* '.' [0-9]+
;
FloatPosLiteral :
'+'? [0-9]* '.' [0-9]+
;
HexLiteral :
'h' QUOTE TwoHexDigits* QUOTE
;
BinaryLiteral :
'b' QUOTE
( Base64Char Base64Char Base64Char Base64Char )*
( (Base64Char Base64Char Base64Char Base64Char )
| (Base64Char Base64Char Base64Char ) '='
| (Base64Char Base64Char ) '=='
)
QUOTE
;
StringLiteral :
QUOTE ( ~['\\] | '\\\'' | '\\\\' )* QUOTE
;
BoolLiteral :
TRUE | FALSE
;
TimestampLiteral :
't' QUOTE
[0-9] [0-9] [0-9] [0-9] HYPHEN
( ('0' [1-9]) | ('1' [012]) ) HYPHEN
( ('0' [1-9]) | ([12] [0-9]) | ('3' [01]) )
'T'
( ([01] [0-9]) | ('2' [0-3]) ) COLON
[0-5] [0-9] COLON
([0-5] [0-9] | '60')
(DOT [0-9]+)?
'Z'
QUOTE
;
//////////////////////////////////////////////
// Keywords
AND: 'AND' ;
OR: 'OR' ;
NOT: 'NOT' ;
FOLLOWEDBY: 'FOLLOWEDBY';
LIKE: 'LIKE' ;
MATCHES: 'MATCHES' ;
ISSUPERSET: 'ISSUPERSET' ;
ISSUBSET: 'ISSUBSET' ;
LAST: 'LAST' ;
IN: 'IN' ;
START: 'START' ;
STOP: 'STOP' ;
SECONDS: 'SECONDS' ;
TRUE: 'true' ;
FALSE: 'false' ;
WITHIN: 'WITHIN' ;
REPEATS: 'REPEATS' ;
TIMES: 'TIMES' ;
// After keywords, so the lexer doesn't tokenize them as identifiers.
// Object types may have unquoted hyphens, but property names
// (in object paths) cannot.
IdentifierWithoutHyphen :
[a-zA-Z_] [a-zA-Z0-9_]*
;
IdentifierWithHyphen :
[a-zA-Z_] [a-zA-Z0-9_-]*
;
EQ : '=' | '==';
NEQ : '!=' | '<>';
LT : '<';
LE : '<=';
GT : '>';
GE : '>=';
QUOTE : '\'';
COLON : ':' ;
DOT : '.' ;
COMMA : ',' ;
RPAREN : ')' ;
LPAREN : '(' ;
RBRACK : ']' ;
LBRACK : '[' ;
PLUS : '+' ;
HYPHEN : MINUS ;
MINUS : '-' ;
POWER_OP : '^' ;
DIVIDE : '/' ;
ASTERISK : '*';
EQRBRAC : ']';
fragment HexDigit: [A-Fa-f0-9];
fragment TwoHexDigits: HexDigit HexDigit;
fragment Base64Char: [A-Za-z0-9+/];
// Whitespace and comments
//
WS : [ \t\r\n\u000B\u000C\u0085\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
// Catch-all to prevent lexer from silently eating unusable characters.
InvalidCharacter
: .
;
Now when I feed [network-traffic:src_port = '123] I expect antlr to parse the query to '123 and ]
However the grammar return '123] and is not able to separate '123 and ]
Am missing anything?
grammar does not separate '123 and ] though the rule is set for it
That is not true. The quote and 123 are separate tokens. As demonstrated/suggested in your previous ANTLR question: start by printing all the tokens to your console to see what tokens are being created. This should always be the first thing you do when trying to debug an ANTLR grammar. It will save you a lot of time and headache.
The fact [network-traffic:src_port = '123] is not parsed properly, is because the ](RBRACK) is being consumed by the alternative observationExpressionSimple:
observationExpression
: LBRACK comparisonExpression RBRACK # observationExpressionSimple
| LPAREN observationExpressions RPAREN # observationExpressionCompound
| observationExpression startStopQualifier # observationExpressionStartStop
| observationExpression withinQualifier # observationExpressionWithin
| observationExpression repeatedQualifier # observationExpressionRepeated
;
Because RBRACK was already consumed by a parser rule, the edgeCases rule can't consume this RBRACK token as well.
To fix this, change your rule:
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK
| RBRACK
;
into this:
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign)
;
Now [network-traffic:src_port = '123] will be parsed properly:

ANTLR4 Token is not recognized when substituted

I try to modify the grammar of the sqlite syntax (I'm interested in a variant of the where clause only) and I'm keep having a weird error when substituting AND to it's own token.
grammar wtfql;
/*
SQLite understands the following binary operators, in order from highest to
lowest precedence:
||
* / %
+ -
<< >> & |
< <= > >=
= != <> IS IS NOT IN LIKE GLOB MATCH REGEXP
AND
OR
*/
start : expr EOF?;
expr
: literal_value
//BIND_PARAMETER
| ( table_name '.' )? column_name
| unary_operator expr
| expr '||' expr
| expr ( '*' | '/' | '%' ) expr
| expr ( '+' | '-' ) expr
| expr ( '<' | '<=' | '>' | '>=' ) expr
| expr ( '=' | '<>' | K_IN ) expr
| expr K_AND expr
| expr K_OR expr
| function_name '(' ( expr ( ',' expr )* )? ')'
| '(' expr ')'
| expr K_NOT expr
| expr ( K_NOT K_NULL )
| expr K_NOT? K_IN ( '(' ( expr ( ',' expr )* ) ')' )
;
unary_operator
: '-'
| '+'
| K_NOT
;
literal_value
: NUMERIC_LITERAL
| STRING_LITERAL
| K_NULL
;
function_name
: IDENTIFIER
;
table_name
: any_name
;
column_name
: any_name
;
any_name
: IDENTIFIER
| keyword
// | '(' any_name ')'
;
keyword
: K_AND
| K_NOT
| K_NULL
| K_IN
| K_OR
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO check: needs more chars in set
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\"' ( ~'\"' | '\"\"' )* '\"'
;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
DOT : '.';
OPEN_PAR : '(';
CLOSE_PAR : ')';
COMMA : ',';
STAR : '*';
PLUS : '+';
MINUS : '-';
TILDE : '~';
DIV : '/';
MOD : '%';
AMP : '&';
PIPE : '|';
LT : '<';
LT_EQ : '<=';
GT : '>';
GT_EQ : '>=';
EQ : '=';
NOT_EQ2 : '<>';
K_AND : A N D;
K_NOT : N O T;
K_NULL : N U L L;
K_OR : O R;
K_IN : I N;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
writing
| expr K_AND expr
with the input
field1=1 and field2 = 2
results in
line 1:8 mismatched input 'and' expecting {<EOF>, '||', '*', '+', '-', '/', '%', '<', '<=', '>', '>=', '=', '<>', K_AND, K_NOT, K_OR, K_IN}
while
| expr 'and' expr
works like a charm:
$ antlr4 wtfql.g4 && javac -classpath /usr/local/Cellar/antlr/4.4/antlr-4.4-complete.jar wtfql*.java && cat test.txt | grun wtfql start -tree -gui
(start (expr (expr (expr (column_name (any_name feld1))) = (expr (literal_value 1))) and (expr (expr (column_name (any_name feld2))) = (expr (literal_value 2)))) <EOF>)
What am I missing?
I presume "and" is an IDENTIFIER since the rule for IDENTIFIER comes before the rule for AND and thus wins.
If you write 'and' in the parser rule this implicitly creates a token (not AND!) which comes before IDENTIFIER and thus wins.
Rule of thumb: More specific lexer rules first. Don't create new lexer tokens implicitly in parser rules.
If you check the token type, you'll get a clue what's going on.

Antlr 4 whitespace in string been eliminated

I'm using Antlr 4 to build a compiler for a made up language. I'm having problems with eliminating whitespace properly. It will get rid of whitespace between tokens but it also delete whitespace within the string token which is obviously not what I want. I've tried using modes to clear this issue up with no avail.
Lexer.g4
lexer grammar WaccLexer;
SEMICOLON: ';' ;
WS: [ \n\t\r\u000C]+ -> skip;
EOL: '\n' ;
BEGIN: 'begin' ;
END: 'end' ;
SKIP: 'skip' ;
READ: 'read' ;
FREE: 'free' ;
RETURN: 'return' ;
EXIT: 'exit' ;
IS: 'is' ;
PRINT: 'print' ;
PRINTLN: 'println' ;
IF: 'if' ;
THEN: 'then' ;
ELSE: 'else' ;
FI: 'fi' ;
WHILE: 'while' ;
DO: 'do' ;
DONE: 'done' ;
NEWPAIR: 'newpair' ;
CALL: 'call' ;
FST: 'fst' ;
SND: 'snd' ;
INT: 'int' ;
BOOL: 'bool' ;
CHAR: 'char' ;
STRING: 'string' ;
PAIR: 'pair' ;
EXCLAMATION: '!' ;
LEN: 'len' ;
ORD: 'ord' ;
TOINT: 'toInt' ;
DIGIT: '0'..'9' ;
LOWCHAR: 'a'..'z' ;
R: 'r' ;
F: 'f' ;
N: 'n' ;
T: 't' ;
B: 'b' ;
ZERO: '0' ;
MULTI: '*' ;
DIVIDE: '/' ;
MOD: '%' ;
PLUS: '+' ;
MINUS: '-' ;
GT: '>' ;
GTE: '>=' ;
LT: '<' ;
LTE: '<=' ;
DOUBLEEQUAL: '==' ;
EQUAL: '=' ;
NOTEQUAL: '!=' ;
AND: '&&' ;
OR: '||' ;
UNDERSCORE: '_' ;
UPCHAR: 'A'..'Z' ;
OPENSQUARE: '[' ;
CLOSESQUARE: ']' ;
OPENPARENTHESIS: '(' ;
CLOSEPARENTHESIS: ')' ;
TRUE: 'true' ;
FALSE: 'false' ;
SINGLEQUOT: '\'' ;
DOUBLEQUOT: '\"' ;
BACKSLASH: '\\' ;
COMMA: ',' ;
NULL: 'null' ;
OPENSTRING : DOUBLEQUOT -> pushMode(STRINGMODE) ;
COMMENT: '#' ~[\r\n]* '\r'? '\n' -> skip ;
mode STRINGMODE ;
CLOSESTRING : DOUBLEQUOT -> popMode ;
CHARACTER : ~[\"\'\\] | (BACKSLASH ESCAPEDCHAR) ;
STRLIT : (CHARACTER)* ;
ESCAPEDCHAR : ZERO
| B
| T
| N
| F
| R
| DOUBLEQUOT
| SINGLEQUOT
| BACKSLASH
;
Parser.g4
parser grammar WaccParser;
options {
tokenVocab=WaccLexer;
}
program : BEGIN (func)* stat END EOF;
func : type ident OPENPARENTHESIS (paramlist)? CLOSEPARENTHESIS IS stat END ;
paramlist : param (COMMA param)* ;
param : type ident ;
stat : SKIP
| type ident EQUAL assignrhs
| assignlhs EQUAL assignrhs
| READ assignlhs
| FREE expr
| RETURN expr
| EXIT expr
| PRINT expr
| PRINTLN expr
| IF expr THEN stat ELSE stat FI
| WHILE expr DO stat DONE
| BEGIN stat END
| stat SEMICOLON stat
;
assignlhs : ident
| expr OPENSQUARE expr CLOSESQUARE
| pairelem
;
assignrhs : expr
| arrayliter
| NEWPAIR OPENPARENTHESIS expr COMMA expr CLOSEPARENTHESIS
| pairelem
| CALL ident OPENPARENTHESIS (arglist)? CLOSEPARENTHESIS
;
arglist : expr (COMMA expr)* ;
pairelem : FST expr
| SND expr
;
type : basetype
| type OPENSQUARE CLOSESQUARE
| pairtype
;
basetype : INT
| BOOL
| CHAR
| STRING
;
pairtype : PAIR OPENPARENTHESIS pairelemtype COMMA pairelemtype CLOSEPARENTHESIS ;
pairelemtype : basetype
| type OPENSQUARE CLOSESQUARE
| PAIR
;
expr : intliter
| boolliter
| charliter
| strliter
| pairliter
| ident
| expr OPENSQUARE expr CLOSESQUARE
| unaryoper expr
| expr binaryoper expr
| OPENPARENTHESIS expr CLOSEPARENTHESIS
;
unaryoper : EXCLAMATION
| MINUS
| LEN
| ORD
| TOINT
;
binaryoper : MULTI
| DIVIDE
| MOD
| PLUS
| MINUS nus
| GT
| GTE
| LT
| LTE
| DOUBLEEQUAL
| NOTEQUAL
| AND
| OR
;
ident : (UNDERSCORE | LOWCHAR | UPCHAR) (UNDERSCORE | LOWCHAR | UPCHAR | DIGIT)* ;
intliter : (intsign)? (digit)+ ;
digit : DIGIT ;
intsign : PLUS
| MINUS
;
boolliter : TRUE
| FALSE
;
charliter : CHARACTER;
strliter : OPENSTRING STRLIT CLOSESTRING;
arrayliter : OPENSQUARE (expr (COMMA expr)*)? CLOSESQUARE ;
Please also remember that comment starting with # need to be ignored. Thanks in advance.
The OPENSTRING lexer rule will never be matched in your grammar because the DOUBLEQUOT rule matches exactly the same input sequence and appears before it in the grammar. If you want to define a lexer rule, but you do not actually want that lexer rule to create a token on its own, then you need to define the rule with the fragment modifier.
fragment DOUBLEQUOT : '"';
In addition, you need to correct the warnings that appear when you generate code for your grammar. At least one of them (defined as EPSILON_TOKEN) indicates a major mistake that you made that used to be an error in ANTLR 4.0 but was changed to a warning in ANTLR 4.1 since there is an edge case where it can be used without problems.

Extracting recursion in ANTLR

I've got a grammar in ANTLR and don't understand how it can be recursive. Is there any way to get ANTLR to show the derivation that it used to see that my rules are recursive?
The recursive grammar in it's entirety:
grammar DeadMG;
options {
language = C;
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INT : '0'..'9'+
;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
program
: namespace_scope_definitions;
namespace_scope_definitions
: (namespace_definition | type_definition | function_definition | variable_definition)+;
type_scope_definitions
: (type_definition | function_definition | variable_definition)*;
namespace_definition
: 'namespace' ID '{' namespace_scope_definitions '}';
type_definition
: 'type' ID? (':' expression (',' expression)+ )? '{' type_scope_definitions '}';
function_definition
: ID '(' function_argument_list ')' ('(' function_argument_list ')')? ('->' expression)? compound_statement;
function_argument_list
: expression? ID (':=' expression)? (',' function_argument_list)?;
variable_definition
: 'static'? expression? ID ':=' expression
| 'static'? expression ID ('(' (expression)* ')')?;
literal_expression
: CHAR
| FLOAT
| INT
| STRING
| 'auto'
| 'type'
| type_definition;
primary_expression
: literal_expression
| ID
| '(' expression ')';
expression
: assignment_expression;
assignment_expression
: logical_or_expression (('=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<='| '>>=' | '&=' | '^=' | '|=') assignment_expression)*;
logical_or_expression
: logical_and_expression ('||' logical_and_expression)*;
logical_and_expression
: inclusive_or_expression ('&&' inclusive_or_expression)*;
inclusive_or_expression
: exclusive_or_expression ('|' exclusive_or_expression)*;
exclusive_or_expression
: and_expression ('^' and_expression)*;
and_expression
: equality_expression ('&' equality_expression)*;
equality_expression
: relational_expression (('=='|'!=') relational_expression)*;
relational_expression
: shift_expression (('<'|'>'|'<='|'>=') shift_expression)*;
shift_expression
: additive_expression (('<<'|'>>') additive_expression)*;
additive_expression
: multiplicative_expression (('+' multiplicative_expression) | ('-' multiplicative_expression))*;
multiplicative_expression
: unary_expression (('*' | '/' | '%') unary_expression)*;
unary_expression
: '++' primary_expression
| '--' primary_expression
| ('&' | '*' | '+' | '-' | '~' | '!') primary_expression
| 'sizeof' primary_expression
| postfix_expression;
postfix_expression
: primary_expression
| '[' expression ']'
| '(' expression* ')'
| '.' ID
| '->' ID
| '++'
| '--';
initializer_statement
: expression ';'
| variable_definition ';';
return_statement
: 'return' expression ';';
try_statement
: 'try' compound_statement catch_statement;
catch_statement
: 'catch' '(' ID ')' compound_statement catch_statement?
| 'catch' '(' '...' ')' compound_statement;
for_statement
: 'for' '(' initializer_statement expression? ';' expression? ')' compound_statement;
while_statement
: 'while' '(' initializer_statement ')' compound_statement;
do_while_statement
: 'do' compound_statement 'while' '(' expression ')';
switch_statement
: 'switch' '(' expression ')' '{' case_statement '}';
case_statement
: 'case:' (statement)* case_statement?
| 'default:' (statement)*;
if_statement
: 'if' '(' initializer_statement ')' compound_statement;
statement
: compound_statement
| return_statement
| try_statement
| initializer_statement
| for_statement
| while_statement
| do_while_statement
| switch_statement
| if_statement;
compound_statement
: '{' (statement)* '}';
More specifically, I am having trouble with the following rules:
namespace_scope_definitions
: (namespace_definition | type_definition | function_definition | variable_definition)+;
type_scope_definitions
: (type_definition | function_definition | variable_definition)*;
ANTLR is saying that alternatives 2 and 4, that is, type_definition and variable_definition, are recursive. Here's variable_definition:
variable_definition
: 'static'? expression? ID ':=' expression
| 'static'? expression ID ('(' (expression)* ')')?;
and here's type_definition:
type_definition
: 'type' ID? (':' expression (',' expression)+ )? '{' type_scope_definitions '}';
'type' itself, and type_definition, is a valid expression in my expression syntax. However, removing it is not resolving the ambiguity, so it doesn't originate there. And I have plenty of other ambiguities I need to resolve- detailing all the warnings and errors would be quite too much, so I'd really like to see more details on how they are recursive from ANTLR itself.
My suggestion is to remove most of the operator precedence rules for now:
expression
: multiplicative_expression
(
('+' multiplicative_expression)
| ('-' multiplicative_expression)
)*;
and then inline the rules that have a single caller to isolate the ambiguities. Yes it is tedious.
I found a few ambiguities in the grammar, fixed them and got a lot less warnings. However, I think that probably, LL is just not the right parsing algorithm for me. I am writing a custom parser and lexer. It would still have been nice if ANTLR would show me how it found the problems though, so that I might intervene and fix them.