I'm trying to understand how compilers and programming languages are made. And to do so I thought about creating a simple calculator which does just addition and subtraction. Below are the Lex and Yacc files which I wrote.
calc.yacc file:
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
void yyerror(char *);
%}
%union { int number; }
%start line
%token <number> NUM
%type <number> expression
%%
line: expression { printf("%d\n", $1); };
expression: expression '+' NUM { $$ = $1 + $3; };
expression: expression '-' NUM { $$ = $1 - $3; };
expression: NUM { $$ = $1; };
%%
void yyerror(char *s) {
fprintf(stderr, "%s", s);
exit(1);
}
int main() {
yyparse();
return 0;
}
calc.lex file:
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}
%%
[0-9]+ {
yylval.number = atoi(yytext);
return NUM;
}
[-+] { return yytext[0]; }
[ \t\f\v\n] { ; }
%%
int yywrap() {
return 1;
}
It compiles nicely but when I run it and type something like 2 + 4 then it gets stuck and doesn't print the answer. Can somebody explain why? My guess is that my grammar is not correct (but I don't know how).
I came to the same idea like rici and changed your samples appropriately:
file calc.l:
%{
#include <stdio.h>
#include <stdlib.h>
#include "calc.y.h"
%}
%%
[0-9]+ {
yylval.number = atoi(yytext);
return NUM;
}
[-+] { return yytext[0]; }
"\n" { return EOL; }
[ \t\f\v\n] { ; }
%%
int yywrap() {
return 1;
}
file calc.y:
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
void yyerror(char *);
%}
%union { int number; }
%start input
%token EOL
%token <number> NUM
%type <number> expression
%%
input: line input | line
line: expression EOL { printf("%d\n", $1); };
expression: expression '+' NUM { $$ = $1 + $3; };
expression: expression '-' NUM { $$ = $1 - $3; };
expression: NUM { $$ = $1; };
%%
void yyerror(char *s) {
fprintf(stderr, "%s", s);
exit(1);
}
int main() {
yyparse();
return 0;
}
Compiled & tested in cygwin on Windows 10 (64 bit):
$ flex -o calc.l.c calc.l
$ bison -o calc.y.c -d calc.y
$ gcc -o calc calc.l.c calc.y.c
$ ./calc
2 + 4
6
2 - 4
-2
234 + 432
666
Notes:
Minor issue: According to the build commands, I had to change the #include for the generated token table. (A matter of taste.)
I introduced the EOL token in the lex source as well as in the line rule of the parser.
While testing I recognized that the 2nd input ended everytimes in a syntax error. I needed a while until I recognized that the grammer was actually limited now to accept precisely one line. Thus, I inserted the recursive input rule in the parser source.
This question is attached to this post https://stackoverflow.com/questions/42848197/bison-flex-cannot-print-out-result?noredirect=1#comment72805876_42848197
this time I try to make my calculator program accepts both integers and floats numbers.
Thank you.
Here is my code
Flex:
%{
#include <stdio.h>
#include "f1.tab.h"
%}
integer [1-9][0-9]*|0
float [0-9]+\.[0-9]+
%%
{integer} { yylval.ival = atoi(yytext); return INT; }
{float} { yylval.fval = atof(yytext); return FLOAT; }
. { return yytext[0]; }
%%
Bison :
%{
#include <stdio.h>
%}
%union {
int ival;
float fval;
}
%token <ival> INT
%token <fval> FLOAT
%type <fval> exp
%type <fval> fac
%type <fval> f
%%
input: line
| input line
;
line: exp ';' { printf("%d\n", $1); };
exp: fac { $$ = $1; }
| exp '+' fac { $$ = $1 + $3; }
| exp '-' fac { $$ = $1 - $3; }
;
fac: f
| fac '*' f { $$ = $1 * $3; }
| fac '/' f { $$ = $1 / $3; }
;
f: INT | FLOAT;
%%
main(int argc, char **argv) {
yyparse();
}
yyerror(char *s) {
fprintf(stderr, "error: %s\n", s);
}
Bison tells you exactly what the problem is:
parser.y:32.4-6: warning: type clash on default action: <fval> != <ival> [-Wother]
f: INT | FLOAT;
^^^
The default action for the rule f: INT copies an ivar to an fvar without any sort of conversion (basically, copying via union). To fix it, you need to insert a conversion:
f: INT { $$ = (double)$1; }
I have the following code for lex and yacc. I am getting kind of extra values in the printed statement can anyone tell. whats wrong with the code?
Lex code:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[ \t] ;
[+-] { yylval=yytext; return Sym;}
(s|c|t)..x { yylval=yytext; return Str;}
[a-zA-Z]+ { printf("Invalid");}
%%
int yywrap()
{
return 1;
}
yacc code:
%{
#include<stdio.h>
%}
%start exps
%token Sym Str
%%
exps: exps exp
| exp
;
exp : Str Sym Str {printf("%s",$1); printf("%s",$2); printf("%s",$3);}
;
%%
int main (void)
{
while(1){
return yyparse();
}
}
yyerror(char *err) {
fprintf(stderr, "%s\n",err);
}
Input:
sinx+cosx
output:
sinx+cosx+cosxcosx
look at the output of the code!!!
yytext is a pointer into flex's internal scanning buffer, so its contents will be modified when the next token is read. If you want to return it to the parser, you need to make a copy:
[+-] { yylval=strdup(yytext); return Sym;}
(s|c|t)..x { yylval=strdup(yytext); return Str;}
Where symbols are a single character, it might make more sense to return that character directly in the scanner:
[-+] { return *yytext; }
in which case, your yacc rules should use the character directly in '-single quotes:
exp : Str '+' Str {printf("%s + %s",$1, $3); free($1); free($3); }
| Str '-' Str {printf("%s - %s",$1, $3); free($1); free($3); }
I am trying to run an example I found online of a calculator. But I have this error showing every time I run my gcc command. Here are the commands that I run:
flex -l calc3.l
yacc -vd calc3.y
gcc y.tab.c -lm -ll
-> at this point I got this error message:
/tmp/ccPOq58f.o : In function 'yyparse':
y.tab.c: undefined reference to 'yylex'
collect2: error: ld returned 1 exit status
Here is my code:
calc3.l
%{
#include <stdlib.h>
#include "calc3.h"
#include "y.tab.h"
void yyerror(char *);
%}
%%
[a-z] {
yylval.sIndex = *yytext - 'a';
return VARIABLE;
}
0 {
yylval.iValue = atoi(yytext);
return INTEGER;
}
[1-9][0-9]* {
yylval.iValue = atoi(yytext);
return INTEGER;
}
[-()<>=+*/;{}.] {
return *yytext;
}
">=" return GE;
"<=" return LE;
"==" return EQ;
"!=" return NE;
"while" return WHILE;
"if" return IF;
"else" return ELSE;
"print" return PRINT;
[ \t\n]+ ; /* ignore whitespace */
. yyerror("Unknown character");
%%
int yywrap(void) {
return 1;
}
here is calc3.h
typedef enum { typeCon, typeId, typeOpr } nodeEnum;
/* constants */
typedef struct {
int value; /* value of constant */
} conNodeType;
/* identifiers */
typedef struct {
int i; /* subscript to sym array */
} idNodeType;
/* operators */
typedef struct {
int oper; /* operator */
int nops; /* number of operands */
struct nodeTypeTag **op; /* operands */
} oprNodeType;
typedef struct nodeTypeTag {
nodeEnum type; /* type of node */
union {
conNodeType con; /* constants */
idNodeType id; /* identifiers */
oprNodeType opr; /* operators */
};
} nodeType;
extern int sym[26];
and here is calc3.y
%{
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include "calc3.h"
/* prototypes */
nodeType *opr(int oper, int nops, ...);
nodeType *id(int i);
nodeType *con(int value);
void freeNode(nodeType *p);
int ex(nodeType *p);
int yylex(void);
void yyerror(char *s);
int sym[26]; /* symbol table */
%}
%union {
int iValue; /* integer value */
char sIndex; /* symbol table index */
nodeType *nPtr; /* node pointer */
};
%token <iValue> INTEGER
%token <sIndex> VARIABLE
%token WHILE IF PRINT
%nonassoc IFX
%nonassoc ELSE
%left GE LE EQ NE '>' '<'
%left '+' '-'
%left '*' '/'
%nonassoc UMINUS
%type <nPtr> stmt expr stmt_list
%%
program:
function { exit(0); }
;
function:
function stmt { ex($2); freeNode($2); }
| /* NULL */
;
stmt:
';' { $$ = opr(';', 2, NULL, NULL); }
| expr ';' { $$ = $1; }
| PRINT expr ';' { $$ = opr(PRINT, 1, $2); }
| VARIABLE '=' expr ';' { $$ = opr('=', 2, id($1), $3); }
| WHILE '(' expr ')' stmt { $$ = opr(WHILE, 2, $3, $5); }
| IF '(' expr ')' stmt %prec IFX { $$ = opr(IF, 2, $3, $5); }
| IF '(' expr ')' stmt ELSE stmt { $$ = opr(IF, 3, $3, $5, $7); }
| '{' stmt_list '}' { $$ = $2; }
;
stmt_list:
stmt { $$ = $1; }
| stmt_list stmt { $$ = opr(';', 2, $1, $2); }
;
expr:
INTEGER { $$ = con($1); }
| VARIABLE { $$ = id($1); }
| '-' expr %prec UMINUS { $$ = opr(UMINUS, 1, $2); }
| expr '+' expr { $$ = opr('+', 2, $1, $3); }
| expr '-' expr { $$ = opr('-', 2, $1, $3); }
| expr '*' expr { $$ = opr('*', 2, $1, $3); }
| expr '/' expr { $$ = opr('/', 2, $1, $3); }
| expr '<' expr { $$ = opr('<', 2, $1, $3); }
| expr '>' expr { $$ = opr('>', 2, $1, $3); }
| expr GE expr { $$ = opr(GE, 2, $1, $3); }
| expr LE expr { $$ = opr(LE, 2, $1, $3); }
| expr NE expr { $$ = opr(NE, 2, $1, $3); }
| expr EQ expr { $$ = opr(EQ, 2, $1, $3); }
| '(' expr ')' { $$ = $2; }
;
%%
nodeType *con(int value) {
nodeType *p;
/* allocate node */
if ((p = malloc(sizeof(nodeType))) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeCon;
p->con.value = value;
return p;
}
nodeType *id(int i) {
nodeType *p;
/* allocate node */
if ((p = malloc(sizeof(nodeType))) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeId;
p->id.i = i;
return p;
}
nodeType *opr(int oper, int nops, ...) {
va_list ap;
nodeType *p;
int i;
/* allocate node */
if ((p = malloc(sizeof(nodeType))) == NULL)
yyerror("out of memory");
if ((p->opr.op = malloc(nops * sizeof(nodeType *))) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeOpr;
p->opr.oper = oper;
p->opr.nops = nops;
va_start(ap, nops);
for (i = 0; i < nops; i++)
p->opr.op[i] = va_arg(ap, nodeType*);
va_end(ap);
return p;
}
void freeNode(nodeType *p) {
int i;
if (!p) return;
if (p->type == typeOpr) {
for (i = 0; i < p->opr.nops; i++)
freeNode(p->opr.op[i]);
free (p->opr.op);
}
free (p);
}
void yyerror(char *s) {
fprintf(stdout, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
If you just use
flex calc3.l
then flex produces a scanner called lex.yy.c. (I removed the -l option which was used in the original question. -l causes flex to be more compatible with certain aspects of the original lex utility, and it has no use except for compiling ancient lex scanners.)
Similarly, if you just use
yacc -vd calc3.y
the bison will produce files called y.tab.c and y.tab.h. And
gcc y.tab.c -lm -ll
will produce a file called a.out.
None of that is a good idea. It's far better to give the files meaningful names, based on the input filenames. All three of these tools understand a -o command-line flag which specifies the output name file.
So you could do this:
flex calc3.l
yacc -vd calc3.y
gcc lex.yy.c y.tab.c -lm -ll
But I'd recommend something like this:
flex -o calc3.lex.c calc3.l
bison -o calc3.tab.c -vd calc3.y
gcc -o calc3 calc3.lex.c calc3.tab.c -lm -ll
When you do this, you'll need to change the #include "y.tab.h" to #include "calc3.tab.h". (Note that if you invoke bison as bison rather than as yacc, it will automatically produce output files with names based on the grammar file. But it doesn't hurt to be explicit.)
Even better if you put it in a Makefile, or at least a script file.
a simple calculator support only + - * / and integer. I use GNU/Linux.
hoc1.l:
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[ \t] { ; }
[0-9]+ { sscanf(yytext, "%d", &yylval); printf("\nget %d\n", yylval); return NUMBER; }
\n {return 0;}
%%
int yywrap(void) {
return 1;
}
hoc1.y
%{
#include<stdio.h>
#define YYSTYPE int
%}
%token NUMBER
%left '+' '-'
%left '*' '/'
%%
list:
| list '\n'
| list expr '\n' {printf("\t%d\n",$2);}
;
expr: NUMBER { $$ = $1; }
| expr '+' expr {$$ = $1+$3;}
| expr '-' expr {$$ = $1-$3;}
| expr '*' expr {$$ = $1*$3;}
| expr '/' expr {$$ = $1/$3;}
;
%%
int main(void)
{
yyparse();
return 0;
}
int yyerror(char *s) {
fprintf(stderr, "*%s*\n", s);
return 0;
}
runtime-error:
% ./hoc
8+9
get 8
+
get 9
*syntax error*
why and how to sovle it, thx!
You forgot to include your operators in your lex file, and you should return nonzero on a successful token read: returning 0 intuitively means there was no match by yylex. Remove the line in your lex file handling the newline character and replace it with the following:
[-+*/\n] { return *yytext; }
. { yyerror("unrecognized character"); return 0; }
Now it should work. Returning *yytext allows your yacc grammar to parse an expression successfully, e.g. if you get a '+', return it to allow the grammar to parse properly.