YACC Grammar fix shift-reduce error with increment/decrement - grammar

I'm new to grammar's and I can't fix a shift-reduce error.
I want my language to accept expressions that are a simple ID, ID++, ID--, --ID or ID++.
I have the following definition:
lvalue : ID (will be extended to have more ways to address variables)
expr : lvalue
| lvalue INCR
| lvalue DECR
| INCR lvalue
| DECR lvalue
| lvalue ATR expr
(...)
| expr '&' expr
| expr '|' expr
| '(' expr ')'
;
I have the following precedences:
%nonassoc INCR DECR
%left '+' '-'
%right ATR
(...)
For INCR lvalue and DECR lvalue I don't get any error, but YACC says it has a shift-reduce error on lvalue INCR and lvalue DECR. The output says:
"95: shift/reduce conflict (shift 123, reduce 57) on INCR
95: shift/reduce conflict (shift 124, reduce 57) on DECR
state 95
expr : lvalue . (57)
expr : lvalue . INCR (61)
expr : lvalue . DECR (62)
expr : lvalue . ATR expr (65)"
I have tried removing lvalue ATR expr but it doesn't solve the issue. Only removing expr : lvalue solves the issue, but I need the expression to be a simple ID too.
Can you help me fix this or tell me where to look?

The conflict indicates that you have a rule for lvalue that right-recursive and has no precedence -- that is a rule something like:
lvalue: something lvalue
or several rules that can combine to the same effect. That's the rule(s) you need to look at.

Related

ANTLR operator precedence broken by optional right recursion?

I'm confused by the behavior of this grammar (in ANTLR 4.8):
grammar Bug;
stat: expr ';' ;
expr: expr '*' expr?
| expr '+' expr
| '(' expr ')'
| INT
| ID
;
ID : [a-zA-Z]+ ;
INT : [0-9]+ ;
WS : [ \t\n\r]+ -> skip ;
That's a minimal modification of an example from the book; all I've done is add a ? to the first alternative for expr, so that * can be either a postfix unary operator or a binary operator.
To my surprise that seems to break the logic around binary operator precedence:
without the ?, 3*4+5; parses as (stat (expr (expr (expr 3) * (expr 4)) + (expr 5)) ;) (as expected)
with the ?, 3*4+5; parses as (stat (expr (expr 3) * (expr (expr 4) + (expr 5))) ;) (wat?)
Is this a bug, or is this behavior expected? How do I get the behavior I was hoping for?
Not sure if this is expected behavior... However, the more readable grammar like this seems to do what you expect it to (and preserves precedence):
expr: expr '*' expr #MulExpr
| expr '+' expr #Addxpr
| expr '*' #PointerExpr
| '(' expr ')' #NestedExpr
| INT #IntExpr
| ID #IdExpr
;
The #...Expr after each alternative are called labels.

Issue with whitespace and int literals / binary operators

I am trying to write a grammar that allows for
Signed integers (i.e. integers with or without a sign; 3, -2, +5)
Unary minus (-)
Binary addition and subtraction (+, -)
Here is the relevant grammar:
expr: INTLITER
| unaryOp expr
| expr binaryOp expr
| OPEN_PAREN expr CLOSE_PAREN
;
unaryOp: MINUS ; // Other operators ommitted for clarity
binaryOp: PLUS | MINUS ;
INTLITER: INTSIGN? DIGIT+ ;
fragment INTSIGN : PLUS | MINUS
WS: [ \r\n\t] -> skip ; // Ignore whitespace
I'm finding a strange issue concerning whitespace.
Consider the expression (2+ 1); this gives a correct parse tree, as expected, like so:
However, (2+1) gives this parse tree:
Since the WS rule means that whitespace is ignored, how is the whitespace here affecting the parse tree?
How might I fix this problem?
The problem with the grammar is that you are trying to represent signed numbers as a token in the lexer. Define the INTLITER without "INTSIGN?". The grammar now works.
grammar arithmetic;
expr: INTLITER
| unaryOp expr
| expr binaryOp expr
| OPEN_PAREN expr CLOSE_PAREN
;
unaryOp: MINUS ; // Other operators ommitted for clarity
binaryOp: PLUS | MINUS ;
INTLITER: DIGIT+ ;
WS: [ \r\n\t] -> skip ; // Ignore whitespace
fragment DIGIT
: ('0' .. '9')+
;
OPEN_PAREN
: '('
;
CLOSE_PAREN
: ')'
;
PLUS
: '+'
;
MINUS
: '-'
;

Reduce/reduce conflict in clike grammar in jison

I'm working on the clike language compiler using Jison package. I went really well until I've introduced classes, thus Type can be a LITERAL now. Here is a simplified grammar:
%lex
%%
\s+ /* skip whitespace */
int return 'INTEGER'
string return 'STRING'
boolean return 'BOOLEAN'
void return 'VOID'
[0-9]+ return 'NUMBER'
[a-zA-Z_][0-9a-zA-Z_]* return 'LITERAL'
"--" return 'DECR'
<<EOF>> return 'EOF'
"=" return '='
";" return ';'
/lex
%%
Program
: EOF
| Stmt EOF
;
Stmt
: Type Ident ';'
| Ident '=' NUMBER ';'
;
Type
: INTEGER
| STRING
| BOOLEAN
| LITERAL
| VOID
;
Ident
: LITERAL
;
And the jison conflict:
Conflict in grammar: multiple actions possible when lookahead token is LITERAL in state 10
- reduce by rule: Ident -> LITERAL
- reduce by rule: Type -> LITERAL
Conflict in grammar: multiple actions possible when lookahead token is = in state 10
- reduce by rule: Ident -> LITERAL
- reduce by rule: Type -> LITERAL
States with conflicts:
State 10
Type -> LITERAL . #lookaheads= LITERAL =
Ident -> LITERAL . #lookaheads= LITERAL =
I've found quite a similar question that has no been answered, does any one have any clue how to solve this?
That's evidently a bug in jison, since the grammar is certainly LALR(1), and is handled without problems by bison. Apparently, jison is incorrectly computing the lookahead for the state in which the conflict occurs. (Update: It seems to be bug 205, reported in January 2014.)
If you ask jison to produce an LR(1) parser instead of an LALR(1) grammar, then it correctly computes the lookaheads and the grammar passes without warnings. However, I don't think that is a sustainable solution.
Here's another work-around. The Decl and Assign productions are not necessary; the "fix" was to remove LITERAL from Type and add a separate production for it.
Program
: EOF
| Stmt EOF
;
Decl
: Type Ident ';'
| LITERAL Ident ';'
;
Assign
: Ident '=' NUMBER ';'
;
Stmt
: Decl
| Assign
;
Type
: INTEGER
| STRING
| BOOLEAN
| VOID
;
Ident
: LITERAL
;
You might want to consider recognizing more than one statement:
Program
: EOF
| Stmts EOF
;
Stmts
: Stmt
| Stmts Stmt
;

How do I find reduce/shift conflicts in my yacc code?

I'm writing a flex/yacc program that should read some tokens and easy grammar using cygwin.
I'm guessing something is wrong with my BNF grammar, but I can't seem to locate the problem. Below is some code
%start statement_list
%%
statement_list: statement
|statement_list statement
;
statement: condition|simple|loop|call_func|decl_array|decl_constant|decl_var;
call_func: IDENTIFIER'('ID_LIST')' {printf("callfunc\n");} ;
ID_LIST: IDENTIFIER
|ID_LIST','IDENTIFIER
;
simple: IDENTIFIER'['NUMBER']' ASSIGN expr
|PRINT STRING
|PRINTLN STRING
|RETURN expr
;
bool_expr: expr E_EQUAL expr
|expr NOT_EQUAL expr
|expr SMALLER expr
|expr E_SMALLER expr
|expr E_LARGER expr
|expr LARGER expr
|expr E_EQUAL bool
|expr NOT_EQUAL bool
;
expr: expr ADD expr {$$ = $1+$3;}
| expr MULT expr {$$ = $1-$3;}
| expr MINUS expr {$$ = $1*$3;}
| expr DIVIDE expr {if($3 == 0) yyerror("divide by zero");
else $$ = $1 / $3;}
|expr ASSIGN expr
| NUMBER
| IDENTIFIER
;
bool: TRUE
|FALSE
;
decl_constant: LET IDENTIFIER ASSIGN expr
|LET IDENTIFIER COLON "bool" ASSIGN bool
|LET IDENTIFIER COLON "Int" ASSIGN NUMBER
|LET IDENTIFIER COLON "String" ASSIGN STRING
;
decl_var: VAR IDENTIFIER
|VAR IDENTIFIER ASSIGN NUMBER
|VAR IDENTIFIER ASSIGN STRING
|VAR IDENTIFIER ASSIGN bool
|VAR IDENTIFIER COLON "Bool" ASSIGN bool
|VAR IDENTIFIER COLON "Int" ASSIGN NUMBER
|VAR IDENTIFIER COLON "String" ASSIGN STRING
;
decl_array: VAR IDENTIFIER COLON "Int" '[' NUMBER ']'
|VAR IDENTIFIER COLON "Bool" '[' NUMBER ']'
|VAR IDENTIFIER COLON "String" '[' NUMBER ']'
;
condition: IF '(' bool_expr ')' statement ELSE statement;
loop: WHILE '(' bool_expr ')' statement;
I've tried changing statement into
statement:';';
,reading a simple token to test if it works, but it seems like my code refuses to enter that part of the grammar.
Also when I compile it, it tells me there are 18 shift/reduce conflicts. Should I try to locate and solve all of them ?
EDIT: I have edited my code using Chris Dodd's answer, trying to solve each conflict by looking at the output file. The last few conflicts seem to be located in the below code.
expr: expr ADD expr {$$ = $1+$3;}
| expr MULT expr {$$ = $1-$3;}
| expr MINUS expr {$$ = $1*$3;}
| expr DIVIDE expr {if($3 == 0) yyerror("divide by zero");
else $$ = $1 / $3;}
|expr ASSIGN expr
| NUMBER
| IDENTIFIER
;
And here is part of the output file telling me what's wrong.
state 60
28 expr: expr . ADD expr
29 | expr . MULT expr
30 | expr . MINUS expr
31 | expr . DIVIDE expr
32 | expr . ASSIGN expr
32 | expr ASSIGN expr .
ASSIGN shift, and go to state 36
ADD shift, and go to state 37
MINUS shift, and go to state 38
MULT shift, and go to state 39
DIVIDE shift, and go to state 40
ASSIGN [reduce using rule 32 (expr)]
ADD [reduce using rule 32 (expr)]
MINUS [reduce using rule 32 (expr)]
MULT [reduce using rule 32 (expr)]
DIVIDE [reduce using rule 32 (expr)]
$default reduce using rule 32 (expr)
I don't understand, why would it choose rule 32 when it read ADD, MULT, DIVIDE or other tokens ? What's wrong with this part of my grammar?
Also, even though that above part of the grammar is wrong, shouldn't my compiler be able to read other grammar correctly ? For instance,
let a = 5
should be readable, yet the program returns syntax error ?
Your grammar looks reasonable, though it does have ambiguities in expressions, most of which could be solved by precedence. You should definitely look at ALL the conflicts reported and understand why they occur, and preferrably change the grammar to get rid of them.
As for your specific issue, if you change it to have statement: ';' ;, it should accept that. You don't show any of your lexing code, so the problem may be there. It may be helpful to compile you parser with -DYYDEBUG=1 to enable the debugging code generated by yacc/bison, and set the global variable yydebug to 1 before calling yyparse, which will dump a trace of everything the parser is doing to stderr.

byacc shift/reduce

I'm having trouble figuring this one out as well as the shift reduce problem.
Adding ';' to the end doesn't solve the problem since I can't change the language, it needs to go just as the following example. Does any prec operand work?
The example is the following:
A variable can be declared as: as a pointer or int as integer, so, both of this are valid:
<int> a = 0
int a = 1
the code goes:
%left '<'
declaration: variable
| declaration variable
variable : type tNAME '=' expr
| type tNAME
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
It obviously gives a shift/reduce problem afer expr. since it can shift for expr of "less" operator or reduce for another variable declaration.
I want precedence to be given on variable declaration, and have tried to create a %nonassoc prec_aux and put after '<' type '>' %prec prec_aux and after type tNAME but it doesn't solve my problem :S
How can I solve this?
Output was:
Well cant figure hwo to post linebreaks and code on reply... so here it goes the output:
35: shift/reduce conflict (shift 47, reduce 7) on '<'
state 35
variable : type tNAME '=' expr . (7)
expr : expr . '+' expr (26)
expr : expr . '-' expr (27)
expr : expr . '*' expr (28)
expr : expr . '/' expr (29)
expr : expr . '%' expr (30)
expr : expr . '<' expr (31)
expr : expr . '>' expr (32)
'>' shift 46
'<' shift 47
'+' shift 48
'-' shift 49
'*' shift 50
'/' shift 51
'%' shift 52
$end reduce 7
tINT reduce 7
Thats the output and the error seems the one I mentioned.
Does anyone know a different solution, other than adding a new terminal to the language that isn't really an option?
I think the resolution is to rewrite the grammar so it can lookahead somehow and see if its a type or expr after the '<' but I'm not seeing how to.
Precedence is unlikely to work since its the same character. Is there a way to give precendence for types that we define? such as declaration?
Thanks in advance
Your grammar gets confused in text like this:
int a = b
<int> c
That '<' on the second line could be part of an expression in the first declaration. It would have to look ahead further to find out.
This is the reason most languages have a statement terminator. This produces no conflicts:
%%
%token tNAME;
%token tINT;
%token tINTEGER;
%token tTERM;
%left '<';
declaration: variable
| declaration variable
variable : type tNAME '=' expr tTERM
| type tNAME tTERM
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
It helps when creating a parser to know how to design a grammar to eliminate possible conflicts. For that you would need an understanding of how parsers work, which is outside the scope of this answer :)
The basic problem here is that you need more lookahead than the 1 token you get with yacc/bison. When the parser sees a < it has no way of telling whether its done with the preivous declaration and its looking at the beginning of a bracketed type, or if this is a less-than operator. There's two basic things you can do here:
Use a parsing method such as bison's %glr-parser option or btyacc, which can deal with non-LR(1) grammars
Use the lexer to do extra lookahead and return disambiguating tokens
For the latter, you would have the lexer do extra lookahead after a '<' and return a different token if its followed by something that looks like a type. The easiest is to use flex's / lookahead operator. For example:
"<"/[ \t\n\r]*"<" return OPEN_ANGLE;
"<"/[ \t\n\r]*"int" return OPEN_ANGLE;
"<" return '<';
Then you change your bison rules to expect OPEN_ANGLE in types instead of <:
type : OPEN_ANGLE type '>'
| tINT
expr : tINTEGER
| expr '<' expr
For more complex problems, you can use flex start states, or even insert an entire token filter/transform pass between the lexer and the parser.
Here is the fix, but not entirely satisfactory:
%{
%}
%token tNAME tINT tINTEGER
%left '<'
%left '+'
%nonassoc '=' /* <-- LOOK */
%%
declaration: variable
| declaration variable
variable : type tNAME '=' expr
| type tNAME
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
| expr '+' expr
;
This issue is a conflict between these two LR items: the dot-final:
variable : type tNAME '=' expr_no_less .
and this one:
expr : expr . '<' expr
Notice that these two have different operators. It is not, as you seem to think, a conflict between different productions involving the '<' operator.
By adding = to the precedence ranking, we fix the issue in the sense that the conflict diagnostic goes away.
Note that I gave = a high precedence. This will resolve the conflict by favoring the reduce. This means that you cannot use a '<' expression as an initializer:
int name = 4 < 3 // syntax error
When the < is seen, the int name = 4 wants to be reduced, and the idea is that < must be the start of the next declaration, as part of a type production.
To allow < relational expressions to be used as initializers, add the support for parentheses into the expression grammar. Then users can parenthesize:
int foo = (4 < 3) <int> bar = (2 < 1)
There is no way to fix that without a more powerful parsing method or hacks.
What if you move the %nonassoc before %left '<', giving it low precedence? Then the shift will be favored. Unfortunately, that has the consequence that you cannot write another <int> declaration after a declaration.
int foo = 3 <int> bar = 4
^ // error: the machine shifted and is now doing: expr '<' . expr.
So that is the wrong way to resolve the conflict; you want to be able to write multiple such declarations.
Another Note:
My TXR language, which implements something equivalent to Parse Expression Grammars handles this grammar fine. This is essentially LL(infinite), which trumps LALR(1).
We don't even have to have a separate lexical analyzer and parser! That's just something made necessary by the limitations of one-symbol-lookahead, and the need for utmost efficiency on 1970's hardware.
Example output from shell command line, demonstrating the parse by translation to a Lisp-like abstract syntax tree, which is bound to the variable dl (declaration list). So this is complete with semantic actions, yielding an output that can be further processed in TXR Lisp. Identifiers are translated to Lisp symbols via calls to intern and numbers are translated to number objects also.
$ txr -l type.txr -
int x = 3 < 4 int y
(dl (decl x int (< 3 4)) (decl y int nil))
$ txr -l type.txr -
< int > x = 3 < 4 < int > y
(dl (decl x (pointer int) (< 3 4)) (decl y (pointer int) nil))
$ txr -l type.txr -
int x = 3 + 4 < 9 < int > y < int > z = 4 + 3 int w
(dl (decl x int (+ 3 (< 4 9))) (decl y (pointer int) nil)
(decl z (pointer int) (+ 4 3)) (decl w int nil))
$ txr -l type.txr -
<<<int>>>x=42
(dl (decl x (pointer (pointer (pointer int))) 42))
The source code of (type.txr):
#(define ws)#/[ \t]*/#(end)
#(define int)#(ws)int#(ws)#(end)
#(define num (n))#(ws)#{n /[0-9]+/}#(ws)#(filter :tonumber n)#(end)
#(define id (id))#\
#(ws)#{id /[A-Za-z_][A-Za-z_0-9]*/}#(ws)#\
#(set id #(intern id))#\
#(end)
#(define type (ty))#\
#(local l)#\
#(cases)#\
#(int)#\
#(bind ty #(progn 'int))#\
#(or)#\
<#(type l)>#\
#(bind ty #(progn '(pointer ,l)))#\
#(end)#\
#(end)
#(define expr (e))#\
#(local e1 op e2)#\
#(cases)#\
#(additive e1)#{op /[<>]/}#(expr e2)#\
#(bind e #(progn '(,(intern op) ,e1 ,e2)))#\
#(or)#\
#(additive e)#\
#(end)#\
#(end)
#(define additive (e))#\
#(local e1 op e2)#\
#(cases)#\
#(num e1)#{op /[+\-]/}#(expr e2)#\
#(bind e #(progn '(,(intern op) ,e1 ,e2)))#\
#(or)#\
#(num e)#\
#(end)#\
#(end)
#(define decl (d))#\
#(local type id expr)#\
#(type type)#(id id)#\
#(maybe)=#(expr expr)#(or)#(bind expr nil)#(end)#\
#(bind d #(progn '(decl ,id ,type ,expr)))#\
#(end)
#(define decls (dl))#\
#(coll :gap 0)#(decl dl)#(end)#\
#(end)
#(freeform)
#(decls dl)