ANTLR: Why the invalid input could match the grammar definition - antlr

I've written a very simple grammar definition for a calculation expression:
grammar SimpleCalc;
options {
output=AST;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
ID : ('a'..'z' | 'A' .. 'Z' | '0' .. '9')+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { Skip(); } ;
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
start: expr EOF;
expr : multExpr ((PLUS | MINUS)^ multExpr)*;
multExpr : atom ((MULT | DIV)^ atom )*;
atom : ID
| '(' expr ')' -> expr;
I've tried the invalid expression ABC &* DEF by start but it passed. It looks like the & charactor is ignored. What's the problem here?

Actually your invalid expression ABC &= DEF hasn't been passed; it causes NoViableAltException.

Related

Antlr4 mismatched input '<' expecting '<' with (seemingly) no lexer ambiguity

I cannot seem to figure out what antlr is doing here in this grammar. I have a grammar that should match an input like:
i,j : bool;
setvar : set<bool>;
i > 5;
j < 10;
But I keep getting an error telling me that "line 3:13 mismatched input '<' expecting '<'". This tells me there is some ambiguity in the lexer, but I only use '<' in a single token.
Here is the grammar:
//// Parser Rules
grammar MLTL1;
start: block*;
block: var_list ';'
| expr ';'
;
var_list: IDENTIFIER (',' IDENTIFIER)* ':' type ;
type: BASE_TYPE
| KW_SET REL_LT BASE_TYPE REL_GT
;
expr: expr REL_OP expr
| '(' expr ')'
| IDENTIFIER
| INT
;
//// Lexical Spec
// Types
BASE_TYPE: 'bool'
| 'int'
| 'float'
;
// Keywords
KW_SET: 'set' ;
// Op groups for precedence
REL_OP: REL_EQ | REL_NEQ | REL_GT | REL_LT
| REL_GTE | REL_LTE ;
// Relational ops
REL_EQ: '==' ;
REL_NEQ: '!=' ;
REL_GT: '>' ;
REL_LT: '<' ;
REL_GTE: '>=' ;
REL_LTE: '<=' ;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
INT
: SIGN? NONZERODIGIT DIGIT*
| '0'
;
fragment
SIGN
: [+-]
;
fragment
DIGIT
: [0-9]
;
fragment
NONZERODIGIT
: [1-9]
;
fragment
LETTER
: [a-zA-Z_]
;
COMMENT : '#' ~[\r\n]* -> skip;
WS : [ \t\r\n]+ -> channel(HIDDEN);
I tested the grammar to see what tokens it is generating for the test input above using this python:
from antlr4 import InputStream, CommonTokenStream
import MLTL1Lexer
import MLTL1Parser
input="""
i,j : bool;
setvar: set<bool>;
i > 5;
j < 10;
"""
lexer = MLTL1Lexer.MLTL1Lexer(InputStream(input))
stream = CommonTokenStream(lexer)
stream.fill()
tokens = stream.getTokens(0,100)
for t in tokens:
print(str(t.type) + " " + t.text)
parser = MLTL1Parser.MLTL1Parser(stream)
parse_tree = parser.start()
print(parse_tree.toStringTree(recog=parser))
And noticed that both '>' and '<' were assigned the same token value despite being two different tokens. Am I missing something here?
(There may be more than just these two instances, but...)
Change REL_OP and BASE_TYPE to parser rules (i.e. make them lowercase.
As you've used them, you're turning many of your intended Lexer rules, effectively into fragments.
I't important to understand that tokens are the "atoms" you have in your grammar, when you combine several of them into another Lexer rule, you just make that the token type.
(If you used grun to dump the tokens you would have seen them identified as REL_OP tokens.
With the changes below, your sample input works just fine.
grammar MLTL1
;
start: block*;
block: var_list ';' | expr ';';
var_list: IDENTIFIER (',' IDENTIFIER)* ':' type;
type: baseType | KW_SET REL_LT baseType REL_GT;
expr: expr rel_op expr | '(' expr ')' | IDENTIFIER | INT;
//// Lexical Spec
// Types
baseType: 'bool' | 'int' | 'float';
// Keywords
KW_SET: 'set';
// Op groups for precedence
rel_op: REL_EQ | REL_NEQ | REL_GT | REL_LT | REL_GTE | REL_LTE;
// Relational ops
REL_EQ: '==';
REL_NEQ: '!=';
REL_GT: '>';
REL_LT: '<';
REL_GTE: '>=';
REL_LTE: '<=';
IDENTIFIER: LETTER (LETTER | DIGIT)*;
INT: SIGN? NONZERODIGIT DIGIT* | '0';
fragment SIGN: [+-];
fragment DIGIT: [0-9];
fragment NONZERODIGIT: [1-9];
fragment LETTER: [a-zA-Z_];
COMMENT: '#' ~[\r\n]* -> skip;
WS: [ \t\r\n]+ -> channel(HIDDEN);

ANTLR grammar not picking the right option

So, I'm trying to assign a method value to a var in a test program, I'm using a Decaf grammar.
The grammar:
// Define decaf grammar
grammar Decaf;
// Reglas LEXER
// Definiciones base para letras y digitos
fragment LETTER: ('a'..'z'|'A'..'Z'|'_');
fragment DIGIT: '0'..'9';
// Las otras reglas de lexer de Decaf
ID: LETTER (LETTER|DIGIT)*;
NUM: DIGIT(DIGIT)*;
CHAR: '\'' ( ~['\r\n\\] | '\\' ['\\] ) '\'';
WS : [ \t\r\n\f]+ -> channel(HIDDEN);
COMMENT
: '/*' .*? '*/' -> channel(2)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(2)
;
// -----------------------------------------------------------------------------------------------------------------------------------------
// Reglas PARSER
program:'class' 'Program' '{' (declaration)* '}';
declaration
: structDeclaration
| varDeclaration
| methodDeclaration
;
varDeclaration
: varType ID ';'
| varType ID '[' NUM ']' ';'
;
structDeclaration:'struct' ID '{' (varDeclaration)* '}' (';')?;
varType
: 'int'
| 'char'
| 'boolean'
| 'struct' ID
| structDeclaration
| 'void'
;
methodDeclaration: methodType ID '(' (parameter (',' parameter)*)* ')' block;
methodType
: 'int'
| 'char'
| 'boolean'
| 'void'
;
parameter
: parameterType ID
| parameterType ID '[' ']'
| 'void'
;
parameterType
: 'int'
| 'char'
| 'boolean'
;
block: '{' (varDeclaration)* (statement)* '}';
statement
: 'if' '(' expression ')' block ( 'else' block )? #stat_if
| 'while' '('expression')' block #stat_else
| 'return' expressionOom ';' #stat_return
| methodCall ';' #stat_mcall
| block #stat_block
| location '=' expression #stat_assignment
| (expression)? ';' #stat_line
;
expressionOom: expression |;
location: (ID|ID '[' expression ']') ('.' location)?;
expression
: location #expr_loc
| methodCall #expr_mcall
| literal #expr_literal
| '-' expression #expr_minus // Unary Minus Operation
| '!' expression #expr_not // Unary NOT Operation
| '('expression')' #expr_parenthesis
| expression arith_op_fifth expression #expr_arith5 // * / % << >>
| expression arith_op_fourth expression #expr_arith4 // + -
| expression arith_op_third expression #expr_arith3 // == != < <= > >=
| expression arith_op_second expression #expr_arith2 // &&
| expression arith_op_first expression #expr_arith1 // ||
;
methodCall: ID '(' arg1 ')';
// Puede ir algo que coincida con arg2 o nada, en caso de una llamada a metodo sin parametro
arg1: arg2 |;
// Expression y luego se utiliza * para permitir 0 o más parametros adicionales
arg2: (arg)(',' arg)*;
arg: expression;
// Operaciones
// Divididas por nivel de precedencia
// Especificación de precedencia: https://anoopsarkar.github.io/compilers-class/decafspec.html
rel_op : '<' | '>' | '<=' | '>=' ;
eq_op : '==' | '!=' ;
arith_op_fifth: '*' | '/' | '%' | '<<' | '>>';
arith_op_fourth: '+' | '-';
arith_op_third: rel_op | eq_op;
arith_op_second: '&&';
arith_op_first: '||';
literal : int_literal | char_literal | bool_literal ;
int_literal : NUM ;
char_literal : '\'' CHAR '\'' ;
bool_literal : 'true' | 'false' ;
And the test program is as follows:
class Program
{
int factorial(int b)
{
int n;
n = 1;
return n+2;
}
void main(void)
{
int a;
int b;
b=0;
a=factorial(b);
factorial(b);
return;
}
}
The parse tree for this program looks as following, at least for the part I'm interested which is a=factorial(b):
This tree is wrong, since it should look like location = expression -> methodCall
The following tree is how it looks on a friend's implementation, and it should sort of look like this if the grammar was correctly implemented:
This is correctly implemented, or the result I need, since I want the tree to look like location = expression -> methodCall and not location = expression -> location. If I remove the parameter from a=factorial(b) and leave it as a=factorial(), it will be read correctly as a methodCall, so I'm not sure what I'm missing.
So my question is, I'm not sure where I'm messing up in the grammar, I guess it's either on location or expression, but I'm not sure how to adjust it to behave the way I want it to. I sort of just got the rules literally from the specification we were provided.
In an ANTLR rule, alternatives are matches from top to bottom. So in your expression rule:
expression
: location #expr_loc
| methodCall #expr_mcall
...
;
the generated parser will try to match a location before it tries to match a methodCall. Try swapping those two around:
expression
: methodCall #expr_mcall
| location #expr_loc
...
;

How has a Language like Apache Velocity to be parsed with Antlr4?

i am working on a grammar to parse apache velocity on my own and i ran into the issue that i am not able to detect normal text neither the markup.
I am getting this message during the first line of the source.
line 1:0 extraneous input '// ${Name}.java' expecting {BREAK, FOREACH, IF, INCLUDE, PARSE, SET, STOP, '#[[', RAW_TEXT, '$'}
The input '// ${Name}.Java' should be tokenized to RAW_TEXT '$' '{' IDENTIFIER '}' RAW_TEXT. The parser rules should be rawText reference rawText. These parser rules are statements.
This is my source file. It is a java template in this case but the source file could or might be also a html template like mentioned in the user guide of apache velocity.
// ${Name}.java
#foreach ( $vertice in $Vertices )
#if ( $vertice.Type == "Class" )
public class $vertice.Name {
#foreach ( $edge in $Edges )
#if ( $edge.from == $vertice.Name)
// From $edge.from to $edge.to
private $edge.to $edge.to.toLowerCase();
public $edge.to get{$edge.to}() {
return this.${edge.to.toLowerCase()};
}
public void set${edge.to}(${edge.to} new${edge.to}) {
$edge.to old${edge.to} = this.${edge.to.toLowerCase()};
if (old${edge.to} != new${edge.to}) {
if (old${edge.to} != null) {
this.${edge.to.toLowerCase()} = null;
old${edge.to}.set${edge.from}(null);
}
this.${edge.to.toLowerCase()} = new${edge.to};
if (new${edge.to} != null) {
new${edge.to}.set${edge.from}(this);
}
}
}
public $edge.from with${edge.to}(${edge.to} new${edge.to}) {
this.set${edge.to}(new${edge.to});
return this;
}
#end
#end
}
#end
#end
This is my grammar.
grammar Velocity;
/* -- Parser Rules --- */
/*
* Start Rule
*/
template
: statementSet EOF?
;
/*
* Statements
*/
statementSet
: statement+
;
statement
: rawText # RawTextStatement
| unparsed # UnparsedStatement
| reference # ReferenceStatement
| setDirective # SetStatement
| ifDirective # IfStatement
| foreachDirective # ForeachStatement
| includeDirective # IncludeStatement
| parseDirective # ParseStatement
| breakDirective # BreakStatement
| stopDirective # StopStatement
;
rawText
: RAW_TEXT
;
unparsed
: UNPARSED UnparsedText=(TEXT | NL)* UNPARSED_END
;
setDirective
: SET '(' assignment ')'
;
ifDirective
: ifPart (elseifPart)* (elsePart)? END
;
foreachDirective
: FOREACH '(' variableReference 'in' enumerable ')' statementSet END
;
includeDirective
: INCLUDE '(' stringValue (',' stringValue)* ')'
;
parseDirective
: PARSE '(' stringValue ')'
;
breakDirective
: BREAK
;
stopDirective
: STOP
;
/*
* Expressions
*/
assignment
: assignableReference '=' expression
;
expression
: reference # ReferenceExpression
| string # StringLiteralExpression
| NUMBER # NumberLiteralExpression
| array # ArrayExpression
| map # MapExpression
| range # RangeExpression
| arithmeticOperation # ArithmeticOperationExpression
| booleanOperation # BooleanOperationExpression
;
enumerable
: array
| map
| range
| reference
;
stringValue
: string # StringValue_String
| reference # StringValue_Reference
;
/*
* References
*/
reference
: DOLLAR Quiet='!'? (referenceType | '{' referenceType '}')
;
assignableReference
: DOLLAR Quiet='!'? (assignableReferenceType | '{' assignableReferenceType '}')
;
referenceType
: assignableReferenceType # ReferenceType_AssignableReferenceType
| methodReference # ReferenceType_MethodReference
;
assignableReferenceType
: variableReference # AssignableReferenceType_VariableReference
| propertyReference # AssignableReferenceType_PropertyReference
;
variableReference
: IDENTIFIER indexNotation?
;
propertyReference
: IDENTIFIER ('.' IDENTIFIER)+ indexNotation?
;
methodReference
: IDENTIFIER ('.' IDENTIFIER)* '.' IDENTIFIER '(' (expression (',' expression)*)? ')' indexNotation?
;
indexNotation
: '[' NUMBER ']' # IndexNotation_Number
| '[' reference ']' # IndexNotation_Reference
| '[' string ']' # IndexNotation_String
;
/*
* Parsed Types
*/
string
: '"' stringText* '"' # DoubleQuotedString
| '\'' TEXT? '\'' # SingleQuotedString
;
stringText
: TEXT # StringText_Text
| reference # StringText_Reference
;
/*
* Container Types
*/
array
: '[' (expression (',' expression)*)? ']'
;
map
: '{' (expression ':' expression (',' expression ':' expression))? '}'
;
range
: '[' n=NUMBER '..' m=NUMBER ']'
;
/*
* Arithmetic Operators
*/
arithmeticOperation
: sum
;
sum
: term (followingTerm)*
;
followingTerm
: Operator=('+' | '-') term
;
term
: factor (followingFactor)*
;
followingFactor
: Operator=('*' | '/' | '%') factor
;
factor
: NUMBER # Factor_Number
| reference # Factor_Reference
| '(' arithmeticOperation ')' # Factor_InnerArithmeticOperation
;
/*
* Boolean Operators
*/
booleanOperation
: disjunction
;
disjunction
: conjunction (followingConjunction)*
;
followingConjunction
: Operator=OR conjunction
;
conjunction
: booleanComparison (followingBooleanComparison)*
;
followingBooleanComparison
: Operator=AND booleanComparison
;
booleanComparison
: booleanFactor (followingBooleanFactor)*
;
followingBooleanFactor
: Operator=(EQUALS | NOT_EQUALS) booleanFactor
;
booleanFactor
: BOOLEAN # BooleanFactor_Boolean
| reference # BooleanFactor_Reference
| negation # BooleanFactor_Negation
| arithmeticComparison # BooleanFactor_ArithmeticComparison
| '(' booleanOperation ')' # BooleanFactor_InnerBooleanOperation
;
arithmeticComparison
: LeftHandSide=arithmeticOperation Operator=(EQUALS | NOT_EQUALS | GREATER_THAN | GREATER_THAN_OR_EQUAL_TO | LESS_THAN | LESS_THAN_OR_EQUAL_TO) RightHandSide=arithmeticOperation
;
negation
: NOT booleanFactor
;
/*
* Conditionals
*/
ifPart
: IF '(' booleanOperation ')' statementSet
;
elseifPart
: ELSEIF '(' booleanOperation ')' statementSet
;
elsePart
: ELSE statementSet
;
/* --- Lexer Rules --- */
/*
* Comments
*/
SINGLE_LINE_COMMENT
: '##' TEXT? NL -> skip
;
MULTI_LINE_COMMENT
: '#*' (TEXT | NL)* '*#' -> skip
;
COMMENT_BLOCK
: '#**' (TEXT | NL)* '*#' -> skip
;
/*
* Directives
*/
BREAK
: '#break'
| '#{break}'
;
DEFINE
: '#define'
| '#{define}'
;
ELSE
: '#else'
| '#{else}'
;
ELSEIF
: '#elseif'
| '#{elseif}'
;
END
: '#end'
| '#{end}'
;
EVALUATE
: '#evaluate'
| '#{evaluate}'
;
FOREACH
: '#foreach'
| '#{foreach}'
;
IF
: '#if'
| '#{if}'
;
INCLUDE
: '#include'
| '#{include}'
;
MACRO
: '#macro'
| '#{macro}'
;
PARSE
: '#parse'
| '#{parse}'
;
SET
: '#set'
| '#{set}'
;
STOP
: '#stop'
| '#{stop}'
;
UNPARSED
: '#[['
;
UNPARSED_END
: ']]#'
;
/*
* Identifier
*/
DOLLAR
: '$' -> more
;
IDENTIFIER
: CHARACTER+ (CHARACTER | INTEGER | HYPHEN | UNDERSCORE)*
;
/*
* Boolean Values
*/
TRUE
: 'true'
;
FALSE
: 'false'
;
/*
* Boolean Operators
*/
EQUALS
: '=='
| 'eq'
;
NOT_EQUALS
: '!='
| 'ne'
;
GREATER_THAN
: '>'
| 'gt'
;
GREATER_THAN_OR_EQUAL_TO
: '>='
| 'ge'
;
LESS_THAN
: '<'
| 'lt'
;
LESS_THAN_OR_EQUAL_TO
: '<='
| 'le'
;
OR
: '||'
;
AND
: '&&'
;
NOT
: '!'
| 'not'
;
/*
* Literals
*/
BOOLEAN
: TRUE
| FALSE
;
NUMBER
: '-'? INTEGER
| '-'? INTEGER '.' INTEGER
;
/*
* Content
*/
RAW_TEXT
: ~[*#$]+
;
TEXT
: (ESC | SAFE_CODE_POINT)+
;
fragment ESC
: '\\' (["\\/#$!bftrn] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment HEX
: [0-9a-fA-F]
;
fragment SAFE_CODE_POINT
: ~["\\\u0000-\u001F]
;
/*
* Atomic elements
*/
CHARACTER
: [a-zA-Z]+
;
INTEGER
: [0-9]+
;
HYPHEN
: '-'
;
UNDERSCORE
: '_'
;
NL
: '\r'
| '\n'
| '\r\n'
;
WS
: ('\t' | ' ' | '\r' | '\n' | '\r\n')+ -> skip
;
What details am i missing here? What has to be done to actually parse velocity code?
Best Regards
Update:
I have changed these lexer rules.
DOLLAR
: '$'
;
RAW_TEXT
: ~[*#$]*
;
TEXT
: (ESC | SAFE_CODE_POINT)*?
;
fragment SAFE_CODE_POINT
: ~[$"\\\u0000-\u001F]
;
And now i'm getting this messages.
[0] line 1:4 mismatched input '{Name}.java\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 2:8 mismatched input ' ( ' expecting '('
[0] line 2:12 mismatched input 'vertice in ' expecting {'!', '{', IDENTIFIER}
[0] line 2:24 mismatched input 'Vertices )\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 3:3 mismatched input ' ( ' expecting '('
[0] line 3:7 mismatched input 'vertice.Type == "Class" )\r\npublic class ' expecting {'!', '{', IDENTIFIER}
[0] line 4:14 mismatched input 'vertice.Name {\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 5:9 mismatched input ' ( ' expecting '('
[0] line 5:13 mismatched input 'edge in ' expecting {'!', '{', IDENTIFIER}
[0] line 5:22 mismatched input 'Edges )\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 6:4 mismatched input ' ( ' expecting '('
[0] line 6:8 mismatched input 'edge.from == ' expecting {'!', '{', IDENTIFIER}
[0] line 6:22 mismatched input 'vertice.Name)\r\n\t' expecting {'!', '{', IDENTIFIER}
It helped, but the lexer is still stealing the $ symbol and why is it expecting a '{' character while the input starts with a '{' character? I will have a look at this problem.

ANTLR4 Unexpected Parse Behavior

I am trying to build a new language with ANTLR, and I have run into a problem. I am trying to support numerical expressions and mathematical operations on numbers(pretty important I reckon), but the parser doesn't seem to be acting how I expect. Here is my grammar:
grammar Lumos;
/*
* Parser Rules
*/
program : 'start' stat+ 'stop';
block : stat*
;
stat : assign
| numop
| if_stat
| while_stat
| display
;
assign : LET ID BE expr ;
display : DISPLAY expr ;
numop : add | subtract | multiply | divide ;
add : 'add' expr TO ID ;
subtract : 'subtract' expr 'from' ID ;
divide : 'divide' ID BY expr ;
multiply : 'multiply' ID BY expr ;
append : 'append' expr TO ID ;
if_stat
: IF condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
condition_block
: expr stat_block
;
stat_block
: OBRACE block CBRACE
| stat
;
while_stat
: WHILE expr stat_block
;
expr : expr POW<assoc=right> expr #powExpr
| MINUS expr #unaryExpr
| NOT expr #notExpr
| expr op=(TIMES|DIV|MOD) expr #multiplicativeExpr
| expr op=(PLUS|MINUS) expr #additiveExpr
| expr op=RELATIONALOPERATOR expr #relationalExpr
| expr op=EQUALITYOPERATOR expr #equalityExpr
| expr AND expr #andExpr
| expr OR expr #orExpr
//| ARRAY #arrayExpr
| atom #atomExpr
;
atom : LPAREN expr RPAREN #parExpr
| (INT|FLOAT) #numberExpr
| (TRUE|FALSE) #booleanAtom
| ID #idAtom
| STRING #stringAtom
| NIX #nixAtom
;
compileUnit : EOF ;
/*
* Lexer Rules
*/
fragment LETTER : [a-zA-Z] ;
MATHOP : PLUS
| MINUS
| TIMES
| DIV
| MOD
| POW
;
RELATIONALOPERATOR : LTEQ
| GTEQ
| LT
| GT
;
EQUALITYOPERATOR : EQ
| NEQ
;
LPAREN : '(' ;
RPAREN : ')' ;
LBRACE : '{' ;
RBRACE : '}' ;
OR : 'or' ;
AND : 'and' ;
BY : 'by' ;
TO : 'to' ;
FROM : 'from' ;
LET : 'let' ;
BE : 'be' ;
EQ :'==' ;
NEQ :'!=' ;
LTEQ :'<=' ;
GTEQ :'>=' ;
LT :'<' ;
GT :'>' ;
//Different statements will choose between these, but they are pretty much the
same.
PLUS :'plus' ;
ADD :'add' ;
MINUS :'minus' ;
SUBTRACT :'sub' ;
TIMES :'times' ;
MULT :'multiply' ;
DIV :'divide' ;
MOD :'mod' ;
POW :'pow' ;
NOT :'not' ;
TRUE :'true' ;
FALSE :'false' ;
NIX :'nix' ;
IF :'if' ;
THEN :'then' ;
ELSE :'else' ;
WHILE :'while' ;
DISPLAY :'display' ;
ARRAY : '['(INT|FLOAT)(','(INT|FLOAT))+']';
ID : [a-z]+ ;
WORD : LETTER+ ;
//NUMBER : INT | FLOAT ;
INT : [0-9]+ ;
FLOAT : [0-9]+ '.' [0-9]*
| '.'[0-9]+
;
COMMENT : '#' ~[\r\n]* -> channel(HIDDEN) ;
WS : [ \n\t\r]+ -> channel(HIDDEN) ;
STRING : '"' (~["{}])+ '"' ;
When given the input let foo be 5 times 3, the visitor sees let foo be 5 and an extraneous times 3. I thought I set up the expr rule so that it would recognize a multiplication expression before it recognizes atoms, so this wouldn't happen. I don't know where I went wrong, but it does not work how I expected.
If anyone has any idea where I went wrong or how I can fix this problem, I would appreciate your input.
You're using TIMES in your parser rules, but the MATHOP also matches TIMES and since MATHOP is defined before your TIMES rule, it gets precedence. That is why the TIMES rule in expr op=(TIMES|DIV|MOD) expr isn't matched.
I don't see you using this MATHOP rule anywhere in your parser rules, so I recommend just removing the MATHOP rule all together.

Precedence in Antlr using parentheses

We are developing a DSL, and we're facing some problems:
Problem 1:
In our DSL, it's allowed to do this:
A + B + C
but not this:
A + B - C
If the user needs to use two or more different operators, he'll need to insert parentheses:
A + (B - C) or (A + B) - C.
Problem 2:
In our DSL, the most precedent operator must be surrounded by parentheses.
For example, instead of using this way:
A + B * C
The user needs to use this:
A + (B * C)
To solve the Problem 1 I've got a snippet of ANTLR that worked, but I'm not sure if it's the best way to solve it:
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr;
To solve the Problem 2, I have no idea where to start.
I appreciate your help to find out a better solution to the first problem and a direction to solve the seconde one. (Sorry for my bad english)
Below is the grammar that we have developed:
grammar TclGrammar;
options {
output=AST;
ASTLabelType=CommonTree;
}
#members {
public boolean isSum(String type) {
System.out.println("Tipo: " + type);
return "+".equals(type);
}
public boolean isSub(String type) {
System.out.println("Tipo: " + type);
return "-".equals(type);
}
}
prog
: exprMain ';' {System.out.println(
$exprMain.tree == null ? "null" : $exprMain.tree.toStringTree());}
;
exprMain
: exprQuando? (exprDeveSatis | exprDeveFalharCaso)
;
exprDeveSatis
: 'DEVE SATISFAZER' '{'! expr '}'!
;
exprDeveFalharCaso
: 'DEVE FALHAR CASO' '{'! expr '}'!
;
exprQuando
: 'QUANDO' '{'! expr '}'!
;
expr
: logicExpr
;
logicExpr
: boolExpr (('E'|'OU')^ boolExpr)*
;
boolExpr
: comparatorExpr
| emExpr
| 'VERDADE'
| 'FALSO'
;
emExpr
: FIELD 'EM' '[' (variable_lista | field_lista) comCruzamentoExpr? ']'
-> ^('EM' FIELD (variable_lista+)? (field_lista+)? comCruzamentoExpr?)
;
comCruzamentoExpr
: 'COM CRUZAMENTO' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COM CRUZAMENTO' FIELD+)
;
comparatorExpr
: sumExpr (('<'^|'<='^|'>'^|'>='^|'='^|'<>'^) sumExpr)?
| naoPreenchidoExpr
| preenchidoExpr
;
naoPreenchidoExpr
: FIELD 'NAO PREENCHIDO' -> ^('NAO PREENCHIDO' FIELD)
;
preenchidoExpr
: FIELD 'PREENCHIDO' -> ^('PREENCHIDO' FIELD)
;
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr
;
multExpr
: funcExpr(('*'^|'/'^) funcExpr)?
;
funcExpr
: 'QUANTIDADE'^ '('! FIELD ')'!
| 'EXTRAI_TEXTO'^ '('! FIELD ';' INTEGER ';' INTEGER ')'!
| cruzaExpr
| 'COMBINACAO_UNICA' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COMBINACAO_UNICA' FIELD+)
| 'EXISTE'^ '('! FIELD ')'!
| 'UNICO'^ '('! FIELD ')'!
| atom
;
cruzaExpr
: operadorCruzaExpr ('CRUZA COM'^|'CRUZA AMBOS'^) operadorCruzaExpr ondeExpr?
;
operadorCruzaExpr
: FIELD('('!field_lista')'!)?
;
ondeExpr
: ('ONDE'^ '('!expr')'!)
;
atom
: FIELD
| VARIABLE
| '('! expr ')'!
;
field_lista
: FIELD(';' field_lista)?
;
variable_lista
: VARIABLE(';' variable_lista)?
;
FIELD
: NONCONTROL_CHAR+
;
NUMBER
: INTEGER | FLOAT
;
VARIABLE
: '\'' NONCONTROL_CHAR+ '\''
;
fragment SIGN: '+' | '-';
fragment NONCONTROL_CHAR: LETTER | DIGIT | SYMBOL;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SYMBOL: '_' | '.' | ',';
fragment FLOAT: INTEGER '.' '0'..'9'*;
fragment INTEGER: '0' | SIGN? '1'..'9' '0'..'9'*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;
This is similar to not having operator precedence at all.
expr
: funcExpr
( ('+' funcExpr)*
| ('-' funcExpr)*
| ('*' funcExpr)*
| ('/' funcExpr)*
)
;
I think the following should work. I'm assuming some lexer tokens with obvious names.
expr: sumExpr;
sumExpr: onlySum | subExpr;
onlySum: atom ( PLUS onlySum )?;
subExpr: onlySub | multExpr;
onlySub: atom ( MINUS onlySub )? ;
multExpr: atom ( STAR atomic )? ;
parenExpr: OPEN_PAREN expr CLOSE_PAREN;
atom: FIELD | VARIABLE | parenExpr
The only* rules match an expression if it only has one type of operator outside of parentheses. The *Expr rules match either a line with the appropriate type of operators or go to the next operator.
If you have multiple types of operators, then they are forced to be inside parentheses because the match will go through atom.