Operator precedence for negate operator - antlr

Update: This is not an issue with ANTLR
It is a bug in the antlr-kotlin generator that I'm using.
Original Question
I want to parse some mathematical expressions that contain variables. Here's my grammar:
expr
: '-' expr # Negate
| expr ( '*' | '/' ) expr # MultDiv
| expr ( '+' | '-' ) expr # AddSub
| '(' expr ')' # Paren
| ID # Var
| NUM # Num
;
But when I try to parse -a + b, I always get -(a + b) and not (-a) + b. How can I fix this?

As mentioned by sepp2k in the comments, this is not reproducible:

Related

Inline comments and empty line in antlr4 grammar

please can anyone explain me, what i need to change i this grammar to support inline comments (such as // some text) and empty line (which contains any number of space characters). I write following grammar, but this doesn't work.
program: line* EOF ;
line: (expression | assignment) (NEWLINE | EOF);
assignment : VARIABLE '=' expression ;
expression : '(' expression ')' #parenthesisExpression
| '-' expression #unaryExpression
| left=expression OP1 right=expression #firstPriorityExpression
| left=expression OP2 right=expression #secondPriorityExpression
| number=NUMBER #numericExpression
| variable=VARIABLE #variableExpression
;
NUMBER : [0-9]+ ;
VARIABLE : [a-zA-Z][a-zA-Z0-9]* ;
OP1 : '*' | '/' ;
OP2 : '+' | '-' ;
NEWLINE : '\r'? '\n' ;
WHITESPACE : [ \t\r]+ -> skip ;
COMMENT : '//' ~[\n\r]* -> skip ;
The fact you added - in a parser rule as a literal token, and also made OP2 match this character causes OP2 to never match a -. You need to have a lexer rule that matches only the single minus sign (as I showed earlier):
op1
: MUL
| DIV
;
op2
: ADD
| MIN
;
...
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
MIN : '-' ;
and then use MIN in your unary alternative:
...
| MIN expression #unaryExpression
...
When you have a separate MIN : '-' ; rule, you could do this:
...
| '-' expression #unaryExpression
...
because now ANTLR "knows" you mean the rule that matches a single -, but ANTLR does not "know" this when you have a lexer rule that matches a either a - or + like your OP2 rule:
OP2 : '+' | '-' ;

ANTLR4 Same Precedence different Associativity

Is it possible to create a grammar that has a rule with multiple operators with the same level of precedence but different associativity?
For example "+" and "-" both have same precedence and associativity (left assoc.)
But if I want to change "+" to right associative but with the same precedence how can I do that ?
I tried:
expr: expr op=('*'|'/') expr # MulDiv
| expr op=(<assoc=right>'+'|'-') expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
And #LucasTrzesniewski. Still does not work your suggestion
addSubOp: <assoc=right>'+'| <assoc=lefts>'-';
expr: expr op=('*'|'/') expr # MulDiv
| expr addSubOp expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
But with no success.

Is it possible to make this YACC grammar unambiguous? expr: ... | expr expr

I am writing a simple calculator in yacc / bison.
The grammar for an expression looks somewhat like this:
expr
: NUM
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '+' expr %prec '*' { $$ = $1; }
| '-' expr %prec '*' { $$ = $1; }
| '(' expr ')' { $$ = $2; }
| expr expr { $$ = $1 '*' $2; }
;
I have declared the precedence of the operators like this.
%left '+' '-'
%left '*' '/'
%nonassoc '('
The problem is with the last rule:
expr expr { $$ = $1 $2; }
I want this rule because I want to be able to write expressions like 5(3+4)(3-24) in my calculator.
Is it possible to make this grammar unambiguous?
The ambiguity results from the fact that you allow unary operators (- expr), so 2 - 2 can be parsed either as a simple subtraction (yielding 0) or as an implicit product (of 2 and -2, yielding -4).
It's clear that subtraction is intended (otherwise subtraction would be impossible to represent) so it is necessary to ban the production expr: expr expr if the second expr on the right-hand side is a unary operation.
That can't be done with precedence declarations (or at least it cannot be done in an obvious way), so the best solution is to write out the grammar explicitly, without relying on precedence to disambiguate.
You will also have to decide exactly what is the precedence of implicit multiplication: either the same as explicit multiplication/division, or stronger. That affects how ab/cd is parsed. There is no consensus that I know of, so it is more or less up to you.
In the following, I assume that implicit multiplication binds more tightly. I also ensure that -ab is parsed as (-a)b, although -(ab) has the same end result (until you start dealing with things like non-arithmetic types and automatic conversions). So just take it as an example.
term: NUM
| '(' expr ')'
unop: term
| '-' unop
| '+' unop
conc: unop
| conc term
prod: conc
| prod '*' conc
| prod '/' conc
expr: prod
| expr '+' prod
| expr '-' prod

Controlling Parameter Slurping

I'm trying to write a grammar that supports functions calls without using parentheses:
f x, y
As in Haskell, I'd like function calls to minimally slurp up their parameters. That is, I want
g 5 + 3
to mean
(g 5) + 3
instead of
g (5 + 3)
Unfortunately, I'm getting the second parse with this grammar:
grammar Parameters;
expr
: '(' expr ')'
| expr MULTIPLICATIVE_OPERATOR expr
| expr ADDITIVE_OPERATOR expr
| ID (expr (',' expr)*?)??
| INT
;
MULTIPLICATIVE_OPERATOR: [*/%];
ADDITIVE_OPERATOR: '+';
ID: [a..z]+;
INT: '-'? [0-9]+;
WHITESPACE: [ \t\n\r]+ -> skip;
The parse tree I'm getting is this:
I had thought that the subrule listed first would get attempted first. In this case, expr ADDITIVE_OPERATOR expr appears before the ID subrule, so why is the ID subrule taking higher precedence?
In this case ANTLR does not the correct rule transformation (to eliminate left recursion and to handle precedences):
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[0] (',' expr_1[0])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5 + (expr_1 3))))
correct would be:
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[5] (',' expr_1[5])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5) + (expr_1 3)))
I am not certain if this is a bug in ANTLR4 or a trade-off of the transformation algorithm. Perhaps one should write an issue to the ANTLR4 jira.
To solve your problem you can simply put the correctly transformed grammar into your code and it should work. The explanation of rule transformation is found in "The Definitive ANTLR4 Reference" on pages 249ff (and perhaps somewhere on the web).

In ANTLR how do I skip the value in a simple expression parser?

Hi there I have been trying to write a simple expression parser, here is the grammar.
grammar extremelysimpleexpr ;
stat : expr ;
expr : sub ;
sub : add ( '-' add )* ;
add : VAL ( '+' VAL )*
| VAL
;
VAL : [0-9]+ ;
[ \t\n\r]+ -> skip ;
It matches these expressions
1 + 1
0 + 3
4
But I do not want it to match single occurrence of VAL. I want it to match 1 + 1 but not 4. How do I do that ?
You'd have to insert predicates, something like this (untested):
stat : expr { expr.start != expr.stop }? ;
But don't do this! That's not a syntactic issue, but a semantic one. This is something you should validate after parsing, unless you want to complicate your grammar for such a little benefit.
Use visitors instead for all your checks.
By the way, your grammar assigns different precedence levels to the - and + operators... I'm not sure this is what you want.
With ANTLR4 you could just write this:
expr : '(' expr ')'
| '-' expr
| expr ('*'|'/') expr
| expr ('+'|'-') expr
| VAL
;
This grammar forces non trivial expressions by syntax:
stat : expr ( '+' expr )+
| expr ( '-' expr )+
;
expr : expr ( '+' expr )+
| expr ( '-' expr )+
| VAL
;