YACC Grammar: Operator precedence issue - yacc

I'm trying to get: (20 + (-3)) * 3 / (20 / 3) / 2 to equal 4. Right now it equals 17.
So basically it's doing (20/3) then dividing that by 2, then dividing 3 by [(20/3)/2], then multiplying that by 17. Not sure how to alter my grammar/rules/precedences to get it to read correctly. Any guidance would be appreciated, thanks.
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
statement: IDtoken EQtoken exp
{ regs[$1] = $3; }
| PRINTtoken exp
{ cout << $2 << endl; }
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
factor: ICONSTtoken
{ $$ = $1;}
| IDtoken
{ $$ = regs[$1]; }
| LPARENtoken exp RPARENtoken
{ $$ = $2;}
%%
My tokens and types look like:
%token BEGINtoken
%token COMMAtoken
%left DIVIDEtoken
%left TIMEStoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> ICONSTtoken
%token <value> IDtoken
%token IStoken
%token LPARENtoken
%left PLUStoken MINUStoken
%token PRINTtoken
%token PROGRAMtoken
%token RPARENtoken
%token SEMItoken
%token VARtoken
%type <value> exp
%type <value> term
%type <value> factor

You really wanted someone to work hard to give you an answer, which is why the question has hung around for a year. Have a read of this help page: https://stackoverflow.com/help/how-to-ask, in particular the part about simplifying the problem. There are lots of rules in your grammar file that were not needed to reproduce the problem. We did not need:
%token BEGINtoken
%token COMMAtoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> IDtoken
%token IStoken
%token PROGRAMtoken
%token VARtoken
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
You could have just removed these tokens and rules to get to the heart of the operator precedence issue. We did not need any variables, declarations, assignments or program structure to illustrate the failure. Learning to simplify is the heart of competent debugging and thus programming. If you'd done this simplification more people would have had a go at answering. I'm saying this not for the OP, but for those that will follow with similar problems!
I'm wondering what school is setting this assignment, as I've seen a fair number of yacc questions on SO around the same dumb problem. I suspect more will come here every year, so answering this will help them. I knew what the issue was on inspection of the grammar, but to test my solution I had to code up a working lexer, some symbol table routines, a main program and other ancillary code. Again, another deterrent for problem solvers.
Lets get to the heart of the problem. You have these token declarations:
%left DIVIDEtoken
%left TIMEStoken
%left PLUStoken MINUStoken
These tell yacc that if any rules are ambiguous that the operators associate left. Your rules for these operators are:
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
However, these rules are not ambiguous, and thus the operator precedence declaration is not required. Yacc will follow the non-ambiguous grammar you have used. The ways these rules are written, it tells yacc that the operators have right associativity, which is the opposite of what you want. Now, it could be clearly seen from the simple arithmetic in your example that the operators were being calculated in a right associative way, and you wanted the opposite. There were really big clues there weren't there?
OK. How to change the associativity? One way would be to make the grammar ambiguous again so that the %left declaration is used, or just flip the rules around to invert the associativity. That's what I did:
exp: term PLUStoken exp
{ $$ = $1 + $3; }
| term MINUStoken exp
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| term TIMEStoken factor
{$$ = $1 * $3;
}
| term DIVIDEtoken factor
{ $$ = $1 / $3;
}
Do you see what I did there? I rotated the grammar rule around the operator.
Now for some more disclaimers. I said this is a dumb exercise. Interpreting expressions on the fly is a poor use of the yacc tool, and not what happens in real compilers or interpreters. In a realistic implementation, a parse tree would be built and the value calculations would be performed during the tree walk. This would then enable the issues of undeclared variables to be resolved (which also occurs in this exercise). The use of the regs array to store values is also dumb, because there is clearly an ancillary symbol table in use to return a unique integer ID for the symbols. In a real compiler/interpreter those values would also be stored in that symbol table.
I hope this tutorial has helped further students understand these parsing issues in their classwork.

Related

Calculator in lex and yacc

I am trying to create a calculator by using lex and yacc. However I can not understand how can I give operator precedence to this program? I could not find any information about it. Which code do I need to add to my project to calculate correctly?
Yacc file is:
%{
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int yylex();
void yyerror(const char *s);
%}
%token INTEGER
%left '*' '/'
%left '+' '-'
%%
program:
program line | line
line:
expr ';' { printf("%d\n",$1); } ; | '\n'
expr:
expr '+' term { $$ = $1 + $3; }
| expr '-' term { $$ = $1 - $3; }
| expr '*' term { $$ = $1 * $3; }
| expr '/' term { $$ = $1 / $3; }
| expr '%' term { $$ = $1 % $3; }
| expr '^' term { $$ = $1 ; }
| term { $$ = $1; }
term:
INTEGER { $$ = $1; }
%%
void yyerror(const char *s) { fprintf(stderr,"%s\n",s); return ; }
int main(void) { /*yydebug=1;*/ yyparse(); return 0; }
Lex file is:
%{
#include <stdlib.h>
#include <stdio.h>
void yyerror(char*);
extern int yylval;
#include "calc.tab.h"
#include<time.h>
%}
%%
[ \t]+ ; //skip whitespace
[0-9]+ {yylval = atoi(yytext); return INTEGER;}
[-+*/%^] {return *yytext;}
\n {return *yytext;}
; {return *yytext;}
. {char msg[25]; sprintf(msg,"%s <%s>","invalid character",yytext); yyerror(msg);}
%left '*' '/'
%left '+' '-'
Precedence declarations are specified in the order from lowest precedence to highest. So in the above code you give * and / the lowest precedence level and + and - the highest. That's the opposite order of what you want, so you'll need to switch the order of these two lines. You'll also want to add the operators % and ^, which are currently part of your grammar, but not your precedence annotations.
With those changes, you'll now have specified the precedence you want, but it won't take effect yet. Why not? Because precedence annotations are used to resolve ambiguities, but your grammar isn't actually ambiguous.
The way you've written the grammar, with only the left operand of all operators being expr and the right operand being term, there's only one way to derive an expression like 2+4*2, namely by deriving 2+4 from expr and 2 from term (because deriving 4*2 from term would be impossible since term can only match a single number). So your grammar treats all operators as left-associative and having the same precedence and your precedence annotations aren't considered at all.
In order for the precedence annotations to be considered, you'll have to change your grammar, so that both operands of the operators are expr (e.g. expr '+' expr instead of expr '+' term). Written like that an expression like 2+4*2 could either be derived by deriving 2+4 from expr as the left operand and 2 from expr as the right operand or 2 as the left and 4*2 as the right and this ambiguity will be resolved using your precedence annotations.

Warning: 2 shift/reduce conflicts [-Wconflicts-sr] err

%{
#include<stdio.h>
#include<stdlib.h>
int regs[30];
%}
%token NUMBER LETTER
%left PLUS MINUS
%left MULT DIV
%%
prog: prog st | ; //when I remove this line the error goes
st : E {printf("ans %d", $1);}| LETTER '=' E {regs[$1] = $3; printf("variable contains %d",regs[$1]);};
E : E PLUS E{$$ = $1 + $3;} //addition
| E MINUS E{$$ = $1 - $3 ;} //subtraction
| MINUS E{$$ = -$2;}
| E MULT E{$$ = $1 * $3 ;}
| E DIV E { if($3)$$= $1 / $3; else yyerror("Divide by 0");}
/*|LBRACE E RBRACE{$$= $2;}
| RBRACE E LBRACE{yyerror("Wrong expression");} */
| NUMBER {$$ = $1;}
| LETTER {$$ = regs[$1];}
;
%%
int main(void)
{
printf("Enter Expression: ");
yyparse();
return 0;
}
int yyerror(char *msg)
{
printf("%s", msg);// printing error
exit(0);
}
I am not able to resolve the conflicts. Also I am getting a segmentation fault when I run it with some edits. I am using yacc and lex for the same.
The two shift-reduce conflicts are the result of the fact that you don't require any explicit separator between statements. Because of that, a = b - 3 could be interpreted as one statement or as two (a = b; - 3). The second interpretation may not seem very natural to you but it is easily derived by the grammar.
In addition, your use of unary minus leads to an incorrect parse of -2/3 as -(2/3) instead of (-2)/3. (You may or may not find this serious, since it has few semantic consequences with these particular operators.) This particular issue and a correct resolution is discussed in the bison manual, and in many many other internet resources.
Both of these explanations are made a bit more visible if you use the -v command line option to bison to produce a description of the parser. See Understanding your parser (again, in the bison manual).

YACC Rules not reduced

This is my code calc.y. I keep getting the error:
yacc: 1 rule never reduced
yacc: 3 reduce/reduce conflicts
not really sure what this means
Ive done some research in other places but I am now lost. Im guessing the rules being referred to is program and statement but even so... what does the reduce rule mean?
%{
#include <stdio.h>
FILE *outfile;
int yyline = 1;
int yycolumn = 1;
%}
%union{
int nw;
struct{
int v;
char s[1000];
}attr;
}
%token SEMInumber
%token LPARENnumber
%token <nw> ICONSTnumber
%token BEGINnumber
%token PROGRAMnumber
%token MINUSnumber
%token TIMESnumber
%token <nw> VARnumber
%token INTnumber
%token EOFnumber
%token COMMAnumber
%token RPARENnumber
%token <nw>IDnumber
%token ENDnumber
%token ISnumber
%token PLUSnumber
%token DIVnumber
%token PRINTnumber
%token EQnumber
%type <attr> exp
%type <attr> term
%type <attr> factor
%%
program: PROGRAMnumber IDnumber ISnumber compstate
;
compstate: BEGINnumber {print_header();} statement ENDnumber{print_end();}
| BEGINnumber {print_header();} statement SEMInumber statement ENDnumber{print_end();}
;
statement: IDnumber EQnumber exp
| PRINTnumber exp
| declaration
;
declaration: VARnumber IDnumber
| VARnumber IDnumber COMMAnumber IDnumber
;
exp: term {$$.v = $1.v; strcpy($$.s, $1.s);}
| exp PLUSnumber term {$$.v = $1.v + $3.v; sprintf($$.s, "(%s) + (%s)", $1. s, $3.s);}
| exp MINUSnumber term {$$.v = $1.v - $3.v; sprintf($$.s, "(%s) - (%s)", $1. s, $3.s);}
;
term: factor {$$.v = $1.v; strcpy($$.s, $1.s);}
| term TIMESnumber factor {$$.v = $1.v * $3.v; sprintf($$.s, "(%s) * (%s)", $1.s, $3.s);}
| term DIVnumber factor {$$.v = $1.v / $3.v; sprintf($$.s, "(%s) / (%s)", $1.s, $3.s);}
;
factor: ICONSTnumber {$$.v = $1; sprintf($$.s, "%d", $1);}
| IDnumber {$$.v = $1.v; strcpy($$.s, $1.s);}
| LPARENnumber exp RPARENnumber {$$.v = $2.v; strcpy($$.s, $2.s);}
;
%%
int main()
{
if(!yyparse())
{
printf("accept\n");
}
else
printf("reject\n");
}
void print_header() {}
void print_end(){}
void yyerror(const char *str)
{
printf("yyerror: %s at line %d\n", str, yyline);
}
When compstate shifts the BEGINnumber token, two inner rules for the mid rule action {print_header();} can be both reduced resulting in a R/R conflict. Youe can replace
compstate: BEGINnumber {print_header();} statement ENDnumber{print_end();}
| BEGINnumber {print_header();} statement SEMInumber statement
ENDnumber{print_end();}
;
with, for example
begin_number:
BEGINnumber { print_header(); }
compstate: begin_number statement ENDnumber{print_end();}
| begin_number statement SEMInumber statement
ENDnumber{print_end();}
;
to solve the conflict.
Informative Messages
%s: %d rules never reduced
Some rules are never used, either because they weren't used in the grammar or because they
were on the losing end of shift/reduce or reduce/reduce conflicts. Either change the
grammar to use the rules or remove them.

bison grammar rules for a custom pascal like language

I'm trying to make a compiler for a custom pascal like language using bison and flex and I end up getting syntax errors for programs that should be correct according to my custom grammar.
My custom grammar:
<program> ::= program id
<block>
<block> ::= {
<sequence>
}
<sequence> ::= <statement> ( ; <statement> )*
<brackets-seq> ::= { <sequence> }
<brack-or-stat> ::= <brackets-seq> |
<statement>
<statement> ::= ε |
<assignment-stat> |
<if-stat> |
<while-stat>
<assignment-stat> ::= id := <expression>
<if-stat> ::= if (<condition>)
<brack-or-stat>
<elsepart>
<elsepart> ::= ε |
else <brack-or-stat>
<while-stat> ::= while (<condition>)
<brack-or-stat>
<expression> ::= <optional-sign> <term> ( <add-oper> <term>)*
<term> ::= <factor> (<mul-oper> <factor>)*
<factor> ::= constant |
(<expression>) |
id
<condition> ::= <boolterm> (and <boolterm>)*
<boolterm> ::= <boolfactor> (or <boolfactor>)*
<boolfactor> ::= not [<condition>] |
[<condition>] |
<expression> <relational-oper> <expression>
<relational-oper> ::= == | < | > | <> | <= | >=
<add-oper> ::= + | -
<mul-oper> ::= * | /
<optional-sign> ::= ε | <add-oper>
My grammar implementation on bison:
%{
#include <stdio.h>
#include <string.h>
int yylex(void);
void yyerror(char *s);
%}
%union {
int i;
char *s;
};
%token <i> INTEGERNUM
%token PROGRAM;
%token OR;
%token AND;
%token NOT;
%token IF;
%token ELSE;
%token WHILE;
%token PLUS;
%token MINUS;
%token MUL;
%token DIV;
%token LSB;
%token RSB;
%token LCB;
%token RCB;
%token LEFTPAR;
%token RIGHTPAR;
%token ID;
%token INT;
%token ASSIGN;
%token ISEQUAL;
%token LTHAN;
%token GTHAN;
%token NOTEQUAL;
%token LESSEQUAL;
%token GREATEREQUAL;
%left '+' '-'
%left '*' '/'
%%
program:
PROGRAM ID block
;
block:
LCB RCB
|LCB sequence RCB
;
sequence:
statement ';'sequence
|statement ';'
;
bracketsSeq:
LCB sequence RCB
;
brackOrStat:
bracketsSeq
|statement
;
statement:
assignmentStat
|ifStat
|whileStat
|
;
assignmentStat:
ID ':=' expression
ifStat:
IF LEFTPAR condition RIGHTPAR brackOrStat elsepart
;
elsepart:
ELSE brackOrStat
|
;
whileStat:
WHILE LEFTPAR condition RIGHTPAR brackOrStat
;
expression:
addOper expression
|expression addOper expression
|term
;
term:
term mulOper term
|factor
;
factor:
INT
|LEFTPAR expression RIGHTPAR
|ID
;
condition:
condition AND condition
|boolterm
;
boolterm:
boolterm OR boolterm
|boolfactor
;
boolfactor:
NOT LSB condition RSB
|LSB condition RSB
|expression relationalOper expression
;
relationalOper:
ISEQUAL
|LTHAN
|GTHAN
|NOTEQUAL
|LESSEQUAL
|GREATEREQUAL
;
addOper:
PLUS
|MINUS
;
mulOper:
MUL
|DIV
;
optionalSign
|addOper
;
%%
int main( int argc, char **argv )
{
extern FILE *yyin;
++argv, --argc; /* skip over program name */
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
do
yyparse();
while(!feof(yyin));
}
My flex implementation is pretty straightforward where I just return tokens for each symbol or identifier needed.
Using my implementation on the following simple program:
program circuit
{
a:=b;
}
I end up getting a syntax error. Specifically when the parsing reaches the point right after := according to my debugging prints I use:
$ ./a.exe verilog.txt
text = program
text = circuit val = circuit
text = {
text = a val = a
text = :=
syntax error
This is the first time I use flex and bison so I'm guessing that I made a wrong implementation of my original grammar to bison since after the ./bison.exe -dy comp.y command I get:
bison conflicts 64 shift/reduce
Any ideas would be helpful. Thanks!
This rule :
assignmentStat: ID ':=' expression
uses a token ':=' which bison gives a code distinct from any other token, and which your lexer has no way of knowing, so you're almost certainly not returning it. You're probably returning ASSIGN for the character sequence ':=', so you want:
assignmentStat: ID ASSIGN expression
For the shift-reduce conflicts, they mean that the parser doesn't match exactly the language you specified, but rather some subset (as determined by the default shift instead of reduce). You can use bison's -v option to get a complete printout of the parser state machine (including all the conflicts) in a .output file. You can then examine the conflicts and determine how you should change the grammar to match what you want.
When I run bison on your example, I see only 9 shift/reduce conflicts, all arising from expr: expr OP expr-style rules, which are ambiguous (may be either right- or left- recursive). The default resolution (shift) makes them all right-recursive, which may not be what you want. You can either change the grammar to not be ambiguous, or use bison's built-in precedence resolution tools to resolve them.

Yacc reading only the first grammar rule

I have this yacc file
%error-verbose
%token END
%token ID
%token INT
%token IF
%token ELSE
%token WHILE
%token FOR
%token BREAK
%token CONTINUE
%token RETURN
%token SEM
%token LPAR
%token RPAR
%token PLUS
%token MINUS
%token MULT
%token DIV
%token MOD
%token GT
%token LT
%token GTE /* >= */
%token LTE /* <= */
%token EQUAL /* == */
%token NEQUAL /* != */
%token AND
%token OR
%token EQ
%token COM
%token PRINT
%token READ
%token FLOAT
%token LABR
%token RABR
%token NUM
%token STR
/*
* precedentce tabLTE
*/
%right EQ PE ME TE DE RE
%left OR
%left AND
%left EQUAL NEQUAL
%left LT GT GTE LTE
%left PLUS MINUS
%left MULT DIV MOD
%right PP MM
%{
#include<stdio.h>
extern char *yyname;
extern char *yytext;
extern int yylineno;
void yyerror(char const *msg)
{
fprintf(stderr,"%s:%d:%s\n", yyname,yylineno,msg);
}
%}
%%
program
: definitions
;
definitions
: definition
| definitions definition
;
definition:
| declaration
;
declarations
: /* null */
| declarations declaration
;
declaration
: INT declarator_list SEM
;
declarator_list
: ID
| declarator_list COM ID
;
statements
: /* null */
| statements statement
;
statement
: expression SEM
| SEM /* null statement */
| if_prefix statement
| if_prefix statement ELSE statement
| loop_prefix statement
;
if_prefix
: IF LPAR expression RPAR
;
loop_prefix
: WHILE LPAR expression RPAR
;
expression
: binary
| expression COM binary
;
binary
: ID
| LPAR expression RPAR
| ID LPAR optional_argument_list RPAR
| binary PLUS binary
| binary MINUS binary
| binary MULT binary
| binary DIV binary
| binary MOD binary
| binary GT binary
| binary LT binary
| binary GTE binary
| binary LTE binary
| binary EQUAL binary
| binary NEQUAL binary
| binary AND binary
| binary OR binary
| ID EQ binary
| ID PE binary
| ID ME binary
| ID TE binary
| ID DE binary
| ID RE binary
;
optional_argument_list
: /* no actual arguments */
| argument_list
;
argument_list
: binary
| argument_list COM binary
;
%%
#include <stdlib.h>
extern FILE *yyin;
int main(int argc, char **argv)
{
int ok;
if (argc != 2) {
fprintf(stderr, "%s: Wrong arguments\n", argv[0]);
return EXIT_FAILURE;
}
yyname = argv[1];
if ((yyin = fopen(yyname, "r")) == NULL) {
fprintf(stderr, "%s: %s: Invalid file\n", argv[0], argv[1]);
return EXIT_FAILURE;
}
return (yyparse() ? EXIT_SUCCESS : EXIT_FAILURE);
}
when the input is
int x;
everything works fine, but when the input is something other than "INT"
lets say FOR it throws an error:
unexpected FOR expecting INT or $end
so it's actually reading only the first rule from the set of rules..
Besides, it keeps showing useless non terminals and terminals warning when bison command is applied.
What is wrong with this yacc file?
The trouble is that the rules:
program
: definitions
;
definitions
: definition
| definitions definition
;
definition:
| declaration
;
declarations
: /* null */
| declarations declaration
;
declaration
: INT declarator_list SEM
;
only allow declarations through; nothing allows statements as part of a program. Your FOR is not a declaration, so the grammar rejects it.
The 'useless non-terminals' warning is trying to tell you:
You have goofed big time; there is a bug in your grammar. You have tried to write rules for some production, but you never let it be recognized, so there was no point in adding it.
Or thereabouts...
Maybe you need:
program
: definitions statements
;
Or maybe you need to allow functions as a definition too, and then the FOR statement will be part of the body of a function.
Asking my LL oracle about your amended grammar:
Out of 15 non-terminals, 14 are reachable, 1 are unreachable:
'declarations'
Circular symbols:
definitions
definitions
The complaint about circular symbols means that 'definitions' can derive itself. For example, 'definitions' can produce 'definitions definition', but 'definition' is nullable, so 'definitions' can produce just itself, kinduva infinite loop few parser generators care to deal with in any sensible way. Looking at it another way, you've defined 'definitions' to be a list of nullable symbols, so how many epsilons would you like to match? How about infinity? :-)
This is a drawback of the yacc/bison style of trying to produce some parser even if there are problems in the grammar; quite convenient if you know exactly what you're doing, but quite confusing otherwise.
But, to the narrow point of what to do about the grammar circularity that's giving you a very unuseful (but by gum compilable!) parser. How about you allow 'definitions' be nullable but not 'definition'? IOW:
definitions : | definitions definition ;
definition : declaration ;
Try to not stack nullability on top of nullability. So when you later change to:
definition : declarations ;
Don't make 'declarations' nullable (that's already handled by 'definitions' being nullable). Instead, change it to:
declarations : declaration | declarations declaration ;
That should get you past the immediate problem and onto some new ones :-)