Bison Grammar %type and %token - grammar

Why is it that I have to use $<nVal>4 explicitly in the below grammar snippet?
I thought the %type <nVal> expr line would remove the need so that I can simply put $4?
Is it not possible to use a different definition for expr so that I can?
%union
{
int nVal;
char *pszVal;
}
%token <nVal> tkNUMBER
%token <pszVal> tkIDENT
%type <nVal> expr
%%
for_statement : tkFOR
tkIDENT { printf( "I:%s\n", $2 ); }
tkEQUALS
expr { printf( "A:%d\n", $<nVal>4 ); } // Why not just $4?
tkTO
expr { printf( "B:%d\n", $<nVal>6 ); } // Why not just $6?
step-statement
list
next-statement;
expr : tkNUMBER { $$ = $1; }
;
Update following rici's answer. This now works a treat:
for_statement : tkFOR
tkIDENT { printf( "I:%s\n", $2 ); }
tkEQUALS
expr { printf( "A:%d\n", $5 /* $<nVal>5 */ ); }
tkTO
expr { printf( "A:%d\n", $8 /* $<nVal>8 */ ); }
step-statement
list
next-statement;

Why is it that I have to use $<nVal>4 explicitly in the below grammar snippet?
Actually, you should use $5 if you want to refer to the expr. $4 is the tkEQUALS, which has no declared type, so any use must be explicitly typed. $3 is the previous midrule action, which has no value since $$ is not assigned in that action.
By the same logic, the second expr is $8; $6 is the second midrule action, which also has no value (and no type).
See the Bison manual:
The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number n to use in $n.

Related

Calculator in lex and yacc

I am trying to create a calculator by using lex and yacc. However I can not understand how can I give operator precedence to this program? I could not find any information about it. Which code do I need to add to my project to calculate correctly?
Yacc file is:
%{
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int yylex();
void yyerror(const char *s);
%}
%token INTEGER
%left '*' '/'
%left '+' '-'
%%
program:
program line | line
line:
expr ';' { printf("%d\n",$1); } ; | '\n'
expr:
expr '+' term { $$ = $1 + $3; }
| expr '-' term { $$ = $1 - $3; }
| expr '*' term { $$ = $1 * $3; }
| expr '/' term { $$ = $1 / $3; }
| expr '%' term { $$ = $1 % $3; }
| expr '^' term { $$ = $1 ; }
| term { $$ = $1; }
term:
INTEGER { $$ = $1; }
%%
void yyerror(const char *s) { fprintf(stderr,"%s\n",s); return ; }
int main(void) { /*yydebug=1;*/ yyparse(); return 0; }
Lex file is:
%{
#include <stdlib.h>
#include <stdio.h>
void yyerror(char*);
extern int yylval;
#include "calc.tab.h"
#include<time.h>
%}
%%
[ \t]+ ; //skip whitespace
[0-9]+ {yylval = atoi(yytext); return INTEGER;}
[-+*/%^] {return *yytext;}
\n {return *yytext;}
; {return *yytext;}
. {char msg[25]; sprintf(msg,"%s <%s>","invalid character",yytext); yyerror(msg);}
%left '*' '/'
%left '+' '-'
Precedence declarations are specified in the order from lowest precedence to highest. So in the above code you give * and / the lowest precedence level and + and - the highest. That's the opposite order of what you want, so you'll need to switch the order of these two lines. You'll also want to add the operators % and ^, which are currently part of your grammar, but not your precedence annotations.
With those changes, you'll now have specified the precedence you want, but it won't take effect yet. Why not? Because precedence annotations are used to resolve ambiguities, but your grammar isn't actually ambiguous.
The way you've written the grammar, with only the left operand of all operators being expr and the right operand being term, there's only one way to derive an expression like 2+4*2, namely by deriving 2+4 from expr and 2 from term (because deriving 4*2 from term would be impossible since term can only match a single number). So your grammar treats all operators as left-associative and having the same precedence and your precedence annotations aren't considered at all.
In order for the precedence annotations to be considered, you'll have to change your grammar, so that both operands of the operators are expr (e.g. expr '+' expr instead of expr '+' term). Written like that an expression like 2+4*2 could either be derived by deriving 2+4 from expr as the left operand and 2 from expr as the right operand or 2 as the left and 4*2 as the right and this ambiguity will be resolved using your precedence annotations.

how to fix integer out of range error :$3 error in YACC

I am developing a calculator using YACC and I receive this error :
Integer out of rang $3;
I have just now started learning yacc and can't rectify the error I can see the question already but no one has answered
%token NUMBER
%%
expr :expr '+'{$$ = $1 + $3;}
%%
#include<stdio.h>
#include "lex.yy.c"
yylex()
{
int c;
c=getchar();
if(isdigit(c))
{
yylval=c-'0';
return NUMBER;
}
return c;
}
int main()
{
yyparse();
return 1;
}
int yyerror(){
return 1;}
$3 refers to the 3rd term on the right side of the rule. In
expr :expr '+'{$$ = $1 + $3;}
there are only 2 terms on the right side of the production...

How to Read Multiple Lines of input file for arithmetic yacc program?

I am new to compilers and learning to make calculator that inputs multiple line equations (one equation each line) from a .txt file. And I am facing the problem of segmentation fault.
YACC Code :
%{
#include <stdio.h>
#include <string.h>
#define YYSTYPE int /* the attribute type for Yacc's stack */
extern int yylval; /* defined by lex, holds attrib of cur token */
extern char yytext[]; /* defined by lex and holds most recent token */
extern FILE * yyin; /* defined by lex; lex reads from this file */
%}
%token NUM
%%
Begin : Line
| Begin Line
;
Line : Calc {printf("%s",$$); }
;
Calc : Expr {printf("Result = %d\n",$1);}
Expr : Fact '+' Expr { $$ = $1 + $3; }
| Fact '-' Expr { $$ = $1 - $3; }
| Fact '*' Expr { $$ = $1 * $3; }
| Fact '/' Expr { $$ = $1 / $3; }
| Fact { $$ = $1; }
| '-' Expr { $$ = -$2; }
;
Fact : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id : NUM { $$ = yylval; }
;
%%
void yyerror(char *mesg); /* this one is required by YACC */
main(int argc, char* *argv){
char ch;
if(argc != 2) {printf("useage: calc filename \n"); exit(1);}
if( !(yyin = fopen(argv[1],"r")) ){
printf("cannot open file\n");exit(1);
}
yyparse();
}
void yyerror(char *mesg){
printf("Bad Expression : %s\n", mesg);
exit(1); /* stop after the first error */
}
LEX Code :
%{
#include <stdio.h>
#include "y.tab.h"
int yylval; /*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit [0-9]
num ({digit})*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n]
other .
%%
{ws} { /* note, no return */ }
{num} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { printf("bad%cbad%d\n",*yytext,*yytext); return '?'; }
%%
/* c functions called in the matching section could go here */
I am trying to print the expression along with result.
Thanks In Advance.
In your parser, you have:
Line : Calc {printf("%s",$$); }
Now $$ is the semantic value which the rule is computing, and you haven't assigned anything to it. So it would not be unreasonable to assume that it is undefined, which would be bad, but in fact it does have a value because of the default rule $$ = $1;. All the same, it would be much more readable to write
printf("%s", $1);
But that's not correct, is it? After all, you have
#define YYSTYPE int
so all semantic types are integers. But you're telling printf that $1 is a string (%s). printf will believe you, so it will go ahead and try to dereference the int as though it were a char*, with predictable results (i.e., a segfault).
You are probably using a compiler which is clever enough to notice the fact that you are trying to print an int with a %s format code. But either you haven't asked the compiler to help you or you are ignoring its advice.
Always compile with warnings enabled. If you are using gcc or clang, that means putting -Wall in the command line. (If you are using some other compiler, find out how to produce warnings. It will be documented.) And then read the warnings and fix them before trying to run the program.
There are several other errors and/or questionable practices in your code. Your grammar is inaccurate (why do you use fact as the left-hand operand of every operator?), and despite your comment, your lexical scanner ignores newline characters, so there is no way the parser can know whether expressions are one per line, two per line, or spread over multiple lines; that will make it hard to use the calculator as a command-line tool.
There is no need to define the lex macro digit; (f)lex recognizes the Posix character class [[:digit:]] (and others, documented here) automatically. Nor is it particularly useful to define the macro num. Overuse of lex macros makes your program harder to read; it is usually better to just write the patterns out in place:
[[:digit:]]+ { yylval = atoi(yytext); return NUM; }
which would be more readable and less work both for you and for anyone reading your code. (If your professor or tutor disagrees, I'd be happy to discuss the matter with them directly.)

YACC Grammar: Operator precedence issue

I'm trying to get: (20 + (-3)) * 3 / (20 / 3) / 2 to equal 4. Right now it equals 17.
So basically it's doing (20/3) then dividing that by 2, then dividing 3 by [(20/3)/2], then multiplying that by 17. Not sure how to alter my grammar/rules/precedences to get it to read correctly. Any guidance would be appreciated, thanks.
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
statement: IDtoken EQtoken exp
{ regs[$1] = $3; }
| PRINTtoken exp
{ cout << $2 << endl; }
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
factor: ICONSTtoken
{ $$ = $1;}
| IDtoken
{ $$ = regs[$1]; }
| LPARENtoken exp RPARENtoken
{ $$ = $2;}
%%
My tokens and types look like:
%token BEGINtoken
%token COMMAtoken
%left DIVIDEtoken
%left TIMEStoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> ICONSTtoken
%token <value> IDtoken
%token IStoken
%token LPARENtoken
%left PLUStoken MINUStoken
%token PRINTtoken
%token PROGRAMtoken
%token RPARENtoken
%token SEMItoken
%token VARtoken
%type <value> exp
%type <value> term
%type <value> factor
You really wanted someone to work hard to give you an answer, which is why the question has hung around for a year. Have a read of this help page: https://stackoverflow.com/help/how-to-ask, in particular the part about simplifying the problem. There are lots of rules in your grammar file that were not needed to reproduce the problem. We did not need:
%token BEGINtoken
%token COMMAtoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> IDtoken
%token IStoken
%token PROGRAMtoken
%token VARtoken
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
You could have just removed these tokens and rules to get to the heart of the operator precedence issue. We did not need any variables, declarations, assignments or program structure to illustrate the failure. Learning to simplify is the heart of competent debugging and thus programming. If you'd done this simplification more people would have had a go at answering. I'm saying this not for the OP, but for those that will follow with similar problems!
I'm wondering what school is setting this assignment, as I've seen a fair number of yacc questions on SO around the same dumb problem. I suspect more will come here every year, so answering this will help them. I knew what the issue was on inspection of the grammar, but to test my solution I had to code up a working lexer, some symbol table routines, a main program and other ancillary code. Again, another deterrent for problem solvers.
Lets get to the heart of the problem. You have these token declarations:
%left DIVIDEtoken
%left TIMEStoken
%left PLUStoken MINUStoken
These tell yacc that if any rules are ambiguous that the operators associate left. Your rules for these operators are:
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
However, these rules are not ambiguous, and thus the operator precedence declaration is not required. Yacc will follow the non-ambiguous grammar you have used. The ways these rules are written, it tells yacc that the operators have right associativity, which is the opposite of what you want. Now, it could be clearly seen from the simple arithmetic in your example that the operators were being calculated in a right associative way, and you wanted the opposite. There were really big clues there weren't there?
OK. How to change the associativity? One way would be to make the grammar ambiguous again so that the %left declaration is used, or just flip the rules around to invert the associativity. That's what I did:
exp: term PLUStoken exp
{ $$ = $1 + $3; }
| term MINUStoken exp
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| term TIMEStoken factor
{$$ = $1 * $3;
}
| term DIVIDEtoken factor
{ $$ = $1 / $3;
}
Do you see what I did there? I rotated the grammar rule around the operator.
Now for some more disclaimers. I said this is a dumb exercise. Interpreting expressions on the fly is a poor use of the yacc tool, and not what happens in real compilers or interpreters. In a realistic implementation, a parse tree would be built and the value calculations would be performed during the tree walk. This would then enable the issues of undeclared variables to be resolved (which also occurs in this exercise). The use of the regs array to store values is also dumb, because there is clearly an ancillary symbol table in use to return a unique integer ID for the symbols. In a real compiler/interpreter those values would also be stored in that symbol table.
I hope this tutorial has helped further students understand these parsing issues in their classwork.

Yacc/Bison: The pseudo-variables ($$, $1, $2,..) and how to print them using printf

I have a lexical analyser written in flex that passes tokens to my parser written in bison.
The following is a small part of my lexer:
ID [a-z][a-z0-9]*
%%
rule {
printf("A rule: %s\n", yytext);
return RULE;
}
{ID} {
printf( "An identifier: %s\n", yytext );
return ID;
}
"(" return LEFT;
")" return RIGHT;
There are other bits for parsing whitespace etc too.
Then part of the parser looks like this:
%{
#include <stdio.h>
#include <stdlib.h>
#define YYSTYPE char*
%}
%token ID RULE
%token LEFT RIGHT
%%
rule_decl :
RULE LEFT ID RIGHT { printf("Parsing a rule, its identifier is: %s\n", $2); }
;
%%
It's all working fine but I just want to print out the ID token using printf - that's all :). I'm not writing a compiler.. it's just that flex/bison are good tools for my software. How are you meant to print tokens? I just get (null) when I print.
Thank you.
I'm not an expert at yacc, but the way I've been handling the transition from the lexer to the parser is as follows: for each lexer token, you should have a separate rule to "translate" the yytext into a suitable form for your parser. In your case, you are probably just interested in yytext itself (while if you were writing a compiler, you'd wrap it in a SyntaxNode object or something like that). Try
%token ID RULE
%token LEFT RIGHT
%%
rule_decl:
RULE LEFT id RIGHT { printf("%s\n", $3); }
id:
ID { $$ = strdup(yytext); }
The point is that the last rule makes yytext available as a $ variable that can be referenced by rules involving id.