ANTLR4 Same Precedence different Associativity - antlr

Is it possible to create a grammar that has a rule with multiple operators with the same level of precedence but different associativity?
For example "+" and "-" both have same precedence and associativity (left assoc.)
But if I want to change "+" to right associative but with the same precedence how can I do that ?
I tried:
expr: expr op=('*'|'/') expr # MulDiv
| expr op=(<assoc=right>'+'|'-') expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
And #LucasTrzesniewski. Still does not work your suggestion
addSubOp: <assoc=right>'+'| <assoc=lefts>'-';
expr: expr op=('*'|'/') expr # MulDiv
| expr addSubOp expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
But with no success.

Related

Operator precedence for negate operator

Update: This is not an issue with ANTLR
It is a bug in the antlr-kotlin generator that I'm using.
Original Question
I want to parse some mathematical expressions that contain variables. Here's my grammar:
expr
: '-' expr # Negate
| expr ( '*' | '/' ) expr # MultDiv
| expr ( '+' | '-' ) expr # AddSub
| '(' expr ')' # Paren
| ID # Var
| NUM # Num
;
But when I try to parse -a + b, I always get -(a + b) and not (-a) + b. How can I fix this?
As mentioned by sepp2k in the comments, this is not reproducible:

ANTLR4 Grammar - Issue with "dot" in fields and extended expressions

I have the following ANTLR4 Grammar
grammar ExpressionGrammar;
parse: (expr)
;
expr: MIN expr
| expr ( MUL | DIV ) expr
| expr ( ADD | MIN ) expr
| NUM
| function
| '(' expr ')'
;
function : ID '(' arguments? ')';
arguments: expr ( ',' expr)*;
/* Tokens */
MUL : '*';
DIV : '/';
MIN : '-';
ADD : '+';
OPEN_PAR : '(' ;
CLOSE_PAR : ')' ;
NUM : '0' | [1-9][0-9]*;
ID : [a-zA-Z_] [a-zA-Z]*;
COMMENT: '//' ~[\r\n]* -> skip;
WS: [ \t\n]+ -> skip;
I have an input expression like this :-
(Fields.V1)*(Fields.V2) + (Constants.Value1)*(Constants.Value2)
The ANTLR parser generated the following text from the grammar above :-
(FieldsV1)*(FieldsV2)+(Constants<missing ')'>
As you can see, the "dots" in Fields.V1 and Fields.V2 are missing from the text and also there is a <missing ')' Error node. I believe I should somehow make ANTLR understand that an expression can also have fields with dot operators.
A question on top of this :-
(Var1)(Var2)
ANTLR is not throwing me error for this above scenario , the expressions should not be (Var1)(Var2) -- It should always have the operator (var1)*(var2) or (var1)+(var2) etc. The parser error tree is not generating this error. How should the grammar be modified to make sure even this scenario is taken into consideration.
To recognize IDs like Fields.V1, change you Lexer rule for ID to something like this:
fragment ID_NODE: [a-zA-Z_][a-zA-Z0-9]*;
ID: ID_NODE ('.' ID_NODE)*;
Notice, since each "node" of the ID follows the same rule, I made it a lexer fragment that I could use to compose the ID rule. I also added 0-9 to the second part of the fragment, since it appears that you want to allow numbers in IDs
Then the ID rule uses the fragment to build out the Lexer rule that allows for dots in the ID.
You also didn't add ID as a valid expr alternative
To handle detection of the error condition in (Var1)(Var2), you need Mike's advice to add the EOF Lexer rule to the end of the parse parser rule. Without the EOF, ANTLR will stop parsing as soon as it reaches the end of a recognized expr ((Var1)). The EOF says "and then you need to find an EOF", so ANTLR will continue parsing into the (Var2) and give you the error.
A revised version that handles both of your examples:
grammar ExpressionGrammar;
parse: expr EOF;
expr:
MIN expr
| expr ( MUL | DIV) expr
| expr ( ADD | MIN) expr
| NUM
| ID
| function
| '(' expr ')';
function: ID '(' arguments? ')';
arguments: expr ( ',' expr)*;
/* Tokens */
MUL: '*';
DIV: '/';
MIN: '-';
ADD: '+';
OPEN_PAR: '(';
CLOSE_PAR: ')';
NUM: '0' | [1-9][0-9]*;
fragment ID_NODE: [a-zA-Z_][a-zA-Z0-9]*;
ID: ID_NODE ('.' ID_NODE)*;
COMMENT: '//' ~[\r\n]* -> skip;
WS: [ \t\n]+ -> skip;
(Now that I've read through the comments, this is pretty much just applying the suggestions in the comments)

Is it possible to make this YACC grammar unambiguous? expr: ... | expr expr

I am writing a simple calculator in yacc / bison.
The grammar for an expression looks somewhat like this:
expr
: NUM
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '+' expr %prec '*' { $$ = $1; }
| '-' expr %prec '*' { $$ = $1; }
| '(' expr ')' { $$ = $2; }
| expr expr { $$ = $1 '*' $2; }
;
I have declared the precedence of the operators like this.
%left '+' '-'
%left '*' '/'
%nonassoc '('
The problem is with the last rule:
expr expr { $$ = $1 $2; }
I want this rule because I want to be able to write expressions like 5(3+4)(3-24) in my calculator.
Is it possible to make this grammar unambiguous?
The ambiguity results from the fact that you allow unary operators (- expr), so 2 - 2 can be parsed either as a simple subtraction (yielding 0) or as an implicit product (of 2 and -2, yielding -4).
It's clear that subtraction is intended (otherwise subtraction would be impossible to represent) so it is necessary to ban the production expr: expr expr if the second expr on the right-hand side is a unary operation.
That can't be done with precedence declarations (or at least it cannot be done in an obvious way), so the best solution is to write out the grammar explicitly, without relying on precedence to disambiguate.
You will also have to decide exactly what is the precedence of implicit multiplication: either the same as explicit multiplication/division, or stronger. That affects how ab/cd is parsed. There is no consensus that I know of, so it is more or less up to you.
In the following, I assume that implicit multiplication binds more tightly. I also ensure that -ab is parsed as (-a)b, although -(ab) has the same end result (until you start dealing with things like non-arithmetic types and automatic conversions). So just take it as an example.
term: NUM
| '(' expr ')'
unop: term
| '-' unop
| '+' unop
conc: unop
| conc term
prod: conc
| prod '*' conc
| prod '/' conc
expr: prod
| expr '+' prod
| expr '-' prod

Controlling Parameter Slurping

I'm trying to write a grammar that supports functions calls without using parentheses:
f x, y
As in Haskell, I'd like function calls to minimally slurp up their parameters. That is, I want
g 5 + 3
to mean
(g 5) + 3
instead of
g (5 + 3)
Unfortunately, I'm getting the second parse with this grammar:
grammar Parameters;
expr
: '(' expr ')'
| expr MULTIPLICATIVE_OPERATOR expr
| expr ADDITIVE_OPERATOR expr
| ID (expr (',' expr)*?)??
| INT
;
MULTIPLICATIVE_OPERATOR: [*/%];
ADDITIVE_OPERATOR: '+';
ID: [a..z]+;
INT: '-'? [0-9]+;
WHITESPACE: [ \t\n\r]+ -> skip;
The parse tree I'm getting is this:
I had thought that the subrule listed first would get attempted first. In this case, expr ADDITIVE_OPERATOR expr appears before the ID subrule, so why is the ID subrule taking higher precedence?
In this case ANTLR does not the correct rule transformation (to eliminate left recursion and to handle precedences):
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[0] (',' expr_1[0])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5 + (expr_1 3))))
correct would be:
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[5] (',' expr_1[5])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5) + (expr_1 3)))
I am not certain if this is a bug in ANTLR4 or a trade-off of the transformation algorithm. Perhaps one should write an issue to the ANTLR4 jira.
To solve your problem you can simply put the correctly transformed grammar into your code and it should work. The explanation of rule transformation is found in "The Definitive ANTLR4 Reference" on pages 249ff (and perhaps somewhere on the web).

In ANTLR how do I skip the value in a simple expression parser?

Hi there I have been trying to write a simple expression parser, here is the grammar.
grammar extremelysimpleexpr ;
stat : expr ;
expr : sub ;
sub : add ( '-' add )* ;
add : VAL ( '+' VAL )*
| VAL
;
VAL : [0-9]+ ;
[ \t\n\r]+ -> skip ;
It matches these expressions
1 + 1
0 + 3
4
But I do not want it to match single occurrence of VAL. I want it to match 1 + 1 but not 4. How do I do that ?
You'd have to insert predicates, something like this (untested):
stat : expr { expr.start != expr.stop }? ;
But don't do this! That's not a syntactic issue, but a semantic one. This is something you should validate after parsing, unless you want to complicate your grammar for such a little benefit.
Use visitors instead for all your checks.
By the way, your grammar assigns different precedence levels to the - and + operators... I'm not sure this is what you want.
With ANTLR4 you could just write this:
expr : '(' expr ')'
| '-' expr
| expr ('*'|'/') expr
| expr ('+'|'-') expr
| VAL
;
This grammar forces non trivial expressions by syntax:
stat : expr ( '+' expr )+
| expr ( '-' expr )+
;
expr : expr ( '+' expr )+
| expr ( '-' expr )+
| VAL
;