Precedence in Antlr using parentheses - antlr

We are developing a DSL, and we're facing some problems:
Problem 1:
In our DSL, it's allowed to do this:
A + B + C
but not this:
A + B - C
If the user needs to use two or more different operators, he'll need to insert parentheses:
A + (B - C) or (A + B) - C.
Problem 2:
In our DSL, the most precedent operator must be surrounded by parentheses.
For example, instead of using this way:
A + B * C
The user needs to use this:
A + (B * C)
To solve the Problem 1 I've got a snippet of ANTLR that worked, but I'm not sure if it's the best way to solve it:
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr;
To solve the Problem 2, I have no idea where to start.
I appreciate your help to find out a better solution to the first problem and a direction to solve the seconde one. (Sorry for my bad english)
Below is the grammar that we have developed:
grammar TclGrammar;
options {
output=AST;
ASTLabelType=CommonTree;
}
#members {
public boolean isSum(String type) {
System.out.println("Tipo: " + type);
return "+".equals(type);
}
public boolean isSub(String type) {
System.out.println("Tipo: " + type);
return "-".equals(type);
}
}
prog
: exprMain ';' {System.out.println(
$exprMain.tree == null ? "null" : $exprMain.tree.toStringTree());}
;
exprMain
: exprQuando? (exprDeveSatis | exprDeveFalharCaso)
;
exprDeveSatis
: 'DEVE SATISFAZER' '{'! expr '}'!
;
exprDeveFalharCaso
: 'DEVE FALHAR CASO' '{'! expr '}'!
;
exprQuando
: 'QUANDO' '{'! expr '}'!
;
expr
: logicExpr
;
logicExpr
: boolExpr (('E'|'OU')^ boolExpr)*
;
boolExpr
: comparatorExpr
| emExpr
| 'VERDADE'
| 'FALSO'
;
emExpr
: FIELD 'EM' '[' (variable_lista | field_lista) comCruzamentoExpr? ']'
-> ^('EM' FIELD (variable_lista+)? (field_lista+)? comCruzamentoExpr?)
;
comCruzamentoExpr
: 'COM CRUZAMENTO' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COM CRUZAMENTO' FIELD+)
;
comparatorExpr
: sumExpr (('<'^|'<='^|'>'^|'>='^|'='^|'<>'^) sumExpr)?
| naoPreenchidoExpr
| preenchidoExpr
;
naoPreenchidoExpr
: FIELD 'NAO PREENCHIDO' -> ^('NAO PREENCHIDO' FIELD)
;
preenchidoExpr
: FIELD 'PREENCHIDO' -> ^('PREENCHIDO' FIELD)
;
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr
;
multExpr
: funcExpr(('*'^|'/'^) funcExpr)?
;
funcExpr
: 'QUANTIDADE'^ '('! FIELD ')'!
| 'EXTRAI_TEXTO'^ '('! FIELD ';' INTEGER ';' INTEGER ')'!
| cruzaExpr
| 'COMBINACAO_UNICA' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COMBINACAO_UNICA' FIELD+)
| 'EXISTE'^ '('! FIELD ')'!
| 'UNICO'^ '('! FIELD ')'!
| atom
;
cruzaExpr
: operadorCruzaExpr ('CRUZA COM'^|'CRUZA AMBOS'^) operadorCruzaExpr ondeExpr?
;
operadorCruzaExpr
: FIELD('('!field_lista')'!)?
;
ondeExpr
: ('ONDE'^ '('!expr')'!)
;
atom
: FIELD
| VARIABLE
| '('! expr ')'!
;
field_lista
: FIELD(';' field_lista)?
;
variable_lista
: VARIABLE(';' variable_lista)?
;
FIELD
: NONCONTROL_CHAR+
;
NUMBER
: INTEGER | FLOAT
;
VARIABLE
: '\'' NONCONTROL_CHAR+ '\''
;
fragment SIGN: '+' | '-';
fragment NONCONTROL_CHAR: LETTER | DIGIT | SYMBOL;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SYMBOL: '_' | '.' | ',';
fragment FLOAT: INTEGER '.' '0'..'9'*;
fragment INTEGER: '0' | SIGN? '1'..'9' '0'..'9'*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;

This is similar to not having operator precedence at all.
expr
: funcExpr
( ('+' funcExpr)*
| ('-' funcExpr)*
| ('*' funcExpr)*
| ('/' funcExpr)*
)
;

I think the following should work. I'm assuming some lexer tokens with obvious names.
expr: sumExpr;
sumExpr: onlySum | subExpr;
onlySum: atom ( PLUS onlySum )?;
subExpr: onlySub | multExpr;
onlySub: atom ( MINUS onlySub )? ;
multExpr: atom ( STAR atomic )? ;
parenExpr: OPEN_PAREN expr CLOSE_PAREN;
atom: FIELD | VARIABLE | parenExpr
The only* rules match an expression if it only has one type of operator outside of parentheses. The *Expr rules match either a line with the appropriate type of operators or go to the next operator.
If you have multiple types of operators, then they are forced to be inside parentheses because the match will go through atom.

Related

ANTLR grammar not picking the right option

So, I'm trying to assign a method value to a var in a test program, I'm using a Decaf grammar.
The grammar:
// Define decaf grammar
grammar Decaf;
// Reglas LEXER
// Definiciones base para letras y digitos
fragment LETTER: ('a'..'z'|'A'..'Z'|'_');
fragment DIGIT: '0'..'9';
// Las otras reglas de lexer de Decaf
ID: LETTER (LETTER|DIGIT)*;
NUM: DIGIT(DIGIT)*;
CHAR: '\'' ( ~['\r\n\\] | '\\' ['\\] ) '\'';
WS : [ \t\r\n\f]+ -> channel(HIDDEN);
COMMENT
: '/*' .*? '*/' -> channel(2)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(2)
;
// -----------------------------------------------------------------------------------------------------------------------------------------
// Reglas PARSER
program:'class' 'Program' '{' (declaration)* '}';
declaration
: structDeclaration
| varDeclaration
| methodDeclaration
;
varDeclaration
: varType ID ';'
| varType ID '[' NUM ']' ';'
;
structDeclaration:'struct' ID '{' (varDeclaration)* '}' (';')?;
varType
: 'int'
| 'char'
| 'boolean'
| 'struct' ID
| structDeclaration
| 'void'
;
methodDeclaration: methodType ID '(' (parameter (',' parameter)*)* ')' block;
methodType
: 'int'
| 'char'
| 'boolean'
| 'void'
;
parameter
: parameterType ID
| parameterType ID '[' ']'
| 'void'
;
parameterType
: 'int'
| 'char'
| 'boolean'
;
block: '{' (varDeclaration)* (statement)* '}';
statement
: 'if' '(' expression ')' block ( 'else' block )? #stat_if
| 'while' '('expression')' block #stat_else
| 'return' expressionOom ';' #stat_return
| methodCall ';' #stat_mcall
| block #stat_block
| location '=' expression #stat_assignment
| (expression)? ';' #stat_line
;
expressionOom: expression |;
location: (ID|ID '[' expression ']') ('.' location)?;
expression
: location #expr_loc
| methodCall #expr_mcall
| literal #expr_literal
| '-' expression #expr_minus // Unary Minus Operation
| '!' expression #expr_not // Unary NOT Operation
| '('expression')' #expr_parenthesis
| expression arith_op_fifth expression #expr_arith5 // * / % << >>
| expression arith_op_fourth expression #expr_arith4 // + -
| expression arith_op_third expression #expr_arith3 // == != < <= > >=
| expression arith_op_second expression #expr_arith2 // &&
| expression arith_op_first expression #expr_arith1 // ||
;
methodCall: ID '(' arg1 ')';
// Puede ir algo que coincida con arg2 o nada, en caso de una llamada a metodo sin parametro
arg1: arg2 |;
// Expression y luego se utiliza * para permitir 0 o más parametros adicionales
arg2: (arg)(',' arg)*;
arg: expression;
// Operaciones
// Divididas por nivel de precedencia
// Especificación de precedencia: https://anoopsarkar.github.io/compilers-class/decafspec.html
rel_op : '<' | '>' | '<=' | '>=' ;
eq_op : '==' | '!=' ;
arith_op_fifth: '*' | '/' | '%' | '<<' | '>>';
arith_op_fourth: '+' | '-';
arith_op_third: rel_op | eq_op;
arith_op_second: '&&';
arith_op_first: '||';
literal : int_literal | char_literal | bool_literal ;
int_literal : NUM ;
char_literal : '\'' CHAR '\'' ;
bool_literal : 'true' | 'false' ;
And the test program is as follows:
class Program
{
int factorial(int b)
{
int n;
n = 1;
return n+2;
}
void main(void)
{
int a;
int b;
b=0;
a=factorial(b);
factorial(b);
return;
}
}
The parse tree for this program looks as following, at least for the part I'm interested which is a=factorial(b):
This tree is wrong, since it should look like location = expression -> methodCall
The following tree is how it looks on a friend's implementation, and it should sort of look like this if the grammar was correctly implemented:
This is correctly implemented, or the result I need, since I want the tree to look like location = expression -> methodCall and not location = expression -> location. If I remove the parameter from a=factorial(b) and leave it as a=factorial(), it will be read correctly as a methodCall, so I'm not sure what I'm missing.
So my question is, I'm not sure where I'm messing up in the grammar, I guess it's either on location or expression, but I'm not sure how to adjust it to behave the way I want it to. I sort of just got the rules literally from the specification we were provided.
In an ANTLR rule, alternatives are matches from top to bottom. So in your expression rule:
expression
: location #expr_loc
| methodCall #expr_mcall
...
;
the generated parser will try to match a location before it tries to match a methodCall. Try swapping those two around:
expression
: methodCall #expr_mcall
| location #expr_loc
...
;

Antlr4 ignoring newlines at all but one point

I am writing a parser for a scripting language, and using antlr 4.5.3 for the purpose.
grammar VSE;
chunk
: block* EOF
;
block
: var '=' exp
| functioncall
;
var
: NAME
| var '[' exp ']'
| var '.' var
;
exp
: number
| string
| var
| functioncall
| <assoc=right> exp exp //concat
;
functioncall
: NAME '(' (exp)? (',' exp)* ')'
| var '.' functioncall
;
string
: '"' (~('"' | '\\' | '\r' | '\n') | '\\' ('"' | '\\'))* '"'
;
NAME
: [a-zA-Z_][a-zA-Z_0-9]*
;
number
: INT | HEX | FLOAT
;
INT
: Digit+
;
HEX
: '0' [xX] [0-9a-fA-F]+
;
FLOAT
: Digit* '.' Digit+
;
Digit
: [0-9]
;
WS
: [ \t\u000C\r\n]+ -> skip
;
However, while testing it, I found a variable assignment like var = something followed by some function call in next line leads to a concat statement. (My concat statement is a variable followed by another like var = var1 var2) I understand that antlr is skipping ALL the new lines in favor of line continuation, but I'd like to add the condition that if there is a new line between two exps, it would treat them as two separate blocks instead of a concat statement. i.e.
var = var2
functioncall(var)
These should be two separate blocks instead of concat statement.
Is there any way to do this?
Does the following rule suitable for you?
block
: var '=' exp NEW_LINE
| functioncall NEW_LINE
;
NEW_LINE: '\r'? '\n'
WS
: [ \t]+ -> skip
;
In another case you should use Semantic Predicates or very unclear grammar.

Using floats in range and grammar mistakes?

I am trying to convert a LALR grammar to LL using ANTLR and I am running into a few problems. So far, I think converting the expressions into a Top-Down approach is straight forward to me. The problem is when I include Range (1..10) and (1.0..10.0) with floats.
I have tried to use the answer found here and somehow it is not even running correctly with my code, let alone solving a range of float, i.e. (float..float).
Float literal and range parameter in ANTLR
Attached is a sample of my grammar that just focuses on this issue.
grammar Test;
options {
language = Java;
output = AST;
}
parse: 'in' rangeExpression ';'
;
rangeExpression : expression ('..' expression)?
;
expression : addingExpression (('=='|'!='|'<='|'<'|'>='|'>') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/'|'div') unaryExpression)*
;
unaryExpression: ('+'|'-')* primitiveElement;
primitiveElement : literalExpression
| id ('.' id)?
| '(' expression ')'
;
literalExpression : NUMBER
| BOOLEAN_LITERAL
| 'infinity'
;
id : IDENTIFIER
;
// L E X I C A L R U L E S
Range
: '..'
;
NUMBER
: (DIGITS Range) => DIGITS {$type=DIGITS;}
| (FloatLiteral) => FloatLiteral {$type=FloatLiteral;}
| DIGITS {$type=DIGITS;}
;
// fragments
fragment FloatLiteral : Float;
fragment Float
: DIGITS ( options {greedy = true; } : '.' DIGIT* EXPONENT?)
| '.' DIGITS EXPONENT?
| DIGITS EXPONENT
;
BOOLEAN_LITERAL : 'false'
| 'true'
;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Any reason why it is not even taking:
in 10;
or
in 10.0;
Thanks in advance!
The following things are not correct:
you're never matching a FloatLiteral in your literalExpression rule
in every alternative of NUMBER you're changing the type of the token, therefor a NUMBER token will never be created
Something like this should work for both 11..22 and 1.1..2.2:
...
literalExpression : INT
| BOOLEAN_LITERAL
| FLOAT
| 'infinity'
;
id : IDENTIFIER
;
// L E X I C A L R U L E S
Range
: '..'
;
INT
: (DIGITS Range)=> DIGITS
| DIGITS (('.' DIGITS EXPONENT? | EXPONENT) {$type=FLOAT;})?
;
BOOLEAN_LITERAL : 'false'
| 'true'
;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment FLOAT : ;
To your question about handling (1.0 .. 10.0):
Notice that parser rule primitiveElement defines an alternative as '(' expression ')', but rule expression can never reach rule rangeExpression.
Consider redefining expression and rangeExpression like so:
expression : rangeExpression
;
rangeExpression : compExpression ('..' compExpression)?
;
compExpression : addingExpression (('=='|'!='|'<='|'<'|'>='|'>') addingExpression)*
;
This ensures that the expression rule sits above all forms of expressions and will work as expected in parentheses.

How to resolve "The following alternatives can never be matched"

I have been struggling to resolve a "multiple alternatives" error in my parser for a couple of days now but with no success. I have been converting Bart Kiers excellent Tiny Language(TL) tutorial code to C# using Sam Harwell's port of ANTLR3 and VS2010. Kudos to both these guys for their excellent work. I believe I have followed Bart's tutorial accurately but as I am a newbie with ANTLR I can't be sure.
I did have the TL code working nicely on a pure math basis i.e. no "functions" or "if then else" or "while" (see screenshot of a little app)
but when I added the code for the missing pieces to complete the tutorial I get a parsing error in "functionCall" and in "list" (see the code below)
grammar Paralex2;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BLOCK;
RETURN;
STATEMENTS;
ASSIGNMENT;
FUNC_CALL;
EXP;
EXP_LIST;
ID_LIST;
IF;
TERNARY;
U_SUB;
NEGATE;
FUNCTION;
INDEXES;
LIST;
LOOKUP;
}
#lexer::namespace{Paralex2}
#parser::namespace{Paralex2}
/*
* Parser Rules
*/
#parser::header {using System; using System.Collections.Generic;}
#parser::members{
public SortedList<string, Function> functions = new SortedList<string, Function>();
private void defineFunction(string id, Object idList, Object block) {
// `idList` is possibly null! Create an empty tree in that case.
CommonTree idListTree = idList == null ? new CommonTree() : (CommonTree)idList;
// `block` is never null.
CommonTree blockTree = (CommonTree)block;
// The function name with the number of parameters after it the unique key
string key = id + idListTree.Children.Count();
functions.Add(key, new Function(id, idListTree, blockTree));
}
}
public parse
: block EOF -> block
;
block
: (statement | functionDecl)* (Return exp ';')? -> ^(BLOCK ^(STATEMENTS statement*) ^(RETURN exp?))
;
statement
: assignment ';' -> assignment
| functionCall ';' -> functionCall
| ifStatement
| forStatement
| whileStatement
;
assignment
: Identifier indexes? '=' exp
-> ^(ASSIGNMENT Identifier indexes? exp)
;
functionCall
: Identifier '(' expList? ')' -> ^(FUNC_CALL Identifier expList?)
| Assert '(' exp ')' -> ^(FUNC_CALL Assert exp)
| Size '(' exp ')' -> ^(FUNC_CALL Size exp)
;
ifStatement
: ifStat elseIfStat* elseStat? End -> ^(IF ifStat elseIfStat* elseStat?)
;
ifStat
: If exp Do block -> ^(EXP exp block)
;
elseIfStat
: Else If exp Do block -> ^(EXP exp block)
;
elseStat
: Else Do block -> ^(EXP block)
;
functionDecl
: Def Identifier '(' idList? ')' block End
{defineFunction($Identifier.text, $idList.tree, $block.tree);}
;
forStatement
: For Identifier '=' exp To exp Do block End
-> ^(For Identifier exp exp block)
;
whileStatement
: While exp Do block End -> ^(While exp block)
;
idList
: Identifier (',' Identifier)* -> ^(ID_LIST Identifier+)
;
expList
: exp (',' exp)* -> ^(EXP_LIST exp+)
;
exp
: condExp
;
condExp
: (orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
orExp
: andExp ('||'^ andExp)*
;
andExp
: equExp ('&&'^ equExp)*
;
equExp
: relExp (('==' | '!=')^ relExp)*
;
relExp
: addExp (('>=' | '<=' | '>' | '<')^ addExp)*
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: powExp ((Mul | Div)^ powExp)*
;
powExp
: unaryExp ('^'^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| '!' atom -> ^(NEGATE atom)
| atom
;
atom
: Nmber
| Bool
| Null
| lookup
;
list
: '[' expList? ']' -> ^(LIST expList?)
;
lookup
: list indexes? -> ^(LOOKUP list indexes?)
| functionCall indexes? -> ^(LOOKUP functionCall indexes?)
| Identifier indexes? -> ^(LOOKUP Identifier indexes?)
| String indexes? -> ^(LOOKUP String indexes?)
| '(' exp ')' indexes? -> ^(LOOKUP exp indexes?)
;
indexes
: ('[' exp ']')+ -> ^(INDEXES exp+)
;
/*
* Lexer Rules
*/
Assert : 'assert';
Size : 'size';
Def : 'def';
If : 'if';
Else : 'else';
Return : 'return';
For : 'for';
While : 'while';
To : 'to';
Do : 'do';
End : 'end';
In : 'in';
Null : 'null';
Or : '||';
And : '&&';
Equals : '==';
NEquals : '!=';
GTEquals : '>=';
LTEquals : '<=';
Pow : '^';
GT : '>';
LT : '<';
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Modulus : '%';
OBrace : '{';
CBrace : '}';
OBracket : '[';
CBracket : ']';
OParen : '(';
CParen : ')';
SColon : ';';
Assign : '=';
Comma : ',';
QMark : '?';
Colon : ':';
Bool
: 'true'
| 'false'
;
Nmber
: Int ('.' Digit*)?
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {Skip();}
| '/*' .* '*/' {Skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
The error messages I get are
Decision can match input such as "CParen" using multiple alternatives: 1, 2 : Line 79:20
and
Decision can match input such as "CBracket" using multiple alternatives: 1, 2 : Line 176:10
The errors relate to the functionCall and list rules. I have examined the parser file in ANTLRWorks 1.5 and confirmed the same errors there. The syntax diagrams for the two rules look like this;
and this;
I have tried several changes to try to solve the problem but I don't seem to be able to get the syntax right. I would appreciate any help you guys could provide and can email the images if that would help.
Thanks in advance
Ian Carson
You have an OR-operator too many in the condExp rule making the grammar ambiguous.
You have:
condExp
: ( orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
But it should be:
condExp
: ( orExp -> orExp)
( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:

ANTLR: Why the invalid input could match the grammar definition

I've written a very simple grammar definition for a calculation expression:
grammar SimpleCalc;
options {
output=AST;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
ID : ('a'..'z' | 'A' .. 'Z' | '0' .. '9')+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { Skip(); } ;
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
start: expr EOF;
expr : multExpr ((PLUS | MINUS)^ multExpr)*;
multExpr : atom ((MULT | DIV)^ atom )*;
atom : ID
| '(' expr ')' -> expr;
I've tried the invalid expression ABC &* DEF by start but it passed. It looks like the & charactor is ignored. What's the problem here?
Actually your invalid expression ABC &= DEF hasn't been passed; it causes NoViableAltException.