Yacc/Bison: The pseudo-variables ($$, $1, $2,..) and how to print them using printf - yacc

I have a lexical analyser written in flex that passes tokens to my parser written in bison.
The following is a small part of my lexer:
ID [a-z][a-z0-9]*
%%
rule {
printf("A rule: %s\n", yytext);
return RULE;
}
{ID} {
printf( "An identifier: %s\n", yytext );
return ID;
}
"(" return LEFT;
")" return RIGHT;
There are other bits for parsing whitespace etc too.
Then part of the parser looks like this:
%{
#include <stdio.h>
#include <stdlib.h>
#define YYSTYPE char*
%}
%token ID RULE
%token LEFT RIGHT
%%
rule_decl :
RULE LEFT ID RIGHT { printf("Parsing a rule, its identifier is: %s\n", $2); }
;
%%
It's all working fine but I just want to print out the ID token using printf - that's all :). I'm not writing a compiler.. it's just that flex/bison are good tools for my software. How are you meant to print tokens? I just get (null) when I print.
Thank you.

I'm not an expert at yacc, but the way I've been handling the transition from the lexer to the parser is as follows: for each lexer token, you should have a separate rule to "translate" the yytext into a suitable form for your parser. In your case, you are probably just interested in yytext itself (while if you were writing a compiler, you'd wrap it in a SyntaxNode object or something like that). Try
%token ID RULE
%token LEFT RIGHT
%%
rule_decl:
RULE LEFT id RIGHT { printf("%s\n", $3); }
id:
ID { $$ = strdup(yytext); }
The point is that the last rule makes yytext available as a $ variable that can be referenced by rules involving id.

Related

Simple calculator program in lex and yacc not giving output

I am trying to write a very simple calculator program using lex and yacc but getting stuck in printing the output. The files are:
calc.l:
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {yylval = atoi(yytext); return NUMBER;}
[ \t] ;
\n return 0;
. return yytext[0];
%%
calc.y:
%{
#include <stdio.h>
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf(" =%d\n", $1);}
;
expression: expression '+' NUMBER {$$ = $1 + $3;}
| expression '-' NUMBER {$$ = $1 - $3;}
| NUMBER {$$ = $1;}
;
The commands I have used:
flex calc.l
bison calc.y -d
gcc lex.yy.c calc.tab.c -lfl
./a.out
After running the last command although the program takes input from the keyboard but does not print anything, simply terminates. I didn't get any warning or error while compiling but it doesn't give any output. Please help.
You have no definition of main, so the main function in -lfl will be used. That library is for flex programs, and its main function will call yylex -- the lexical scanner -- until it returns 0.
You need to call the parser. Furthermore, you need to call it repeatedly, because your lexical scanner returns 0, indicating end of input, every time it reads a newline.
So you might use something like this:
int main(void) {
do {
yyparse();
} while (!feof(stdin));
return 0;
}
However, that will reveal some other problems. Most irritatingly, your grammar will not accept an empty input, so an empty line will trigger a syntax error. That will certainly happen at the end of the input, because the EOF will cause yylex to return 0 immediately, which is indistinguishable from an empty line.
Also, any error encountered during the parse will cause the parse to terminate immediately, leaving the remainder of the input line unread.
On the whole, it is often better for the scanner to return a newline token (or \n) for newline characters.
Other than the main function which you don't require, the only thing in -lfl is a default definition of yywrap. You could just define this function yourself (it only needs to return 1), or you could avoid the need for the function by adding
%option noyywrap
to your flex file. In fact, I usually recommend
%option noyywrap noinput nounput
which will avoid the compiler warnings (which you didn't see because you didn't supply -Wall when you compiled the program, which you should do.)
Another compiler warning will be avoided by adding a declaration of yylex to your bison input file before the definition of yyerror:
int yylex(void);
Finally, yylval is declared in y.tab.h, so there is no need for extern int yylval; in your flex file. In this case, it doesn't hurt, but if you change the type of the semantic value, which you will probably eventually want to do, this line will need to be changed as well. Better to just eliminate it.

How to Read Multiple Lines of input file for arithmetic yacc program?

I am new to compilers and learning to make calculator that inputs multiple line equations (one equation each line) from a .txt file. And I am facing the problem of segmentation fault.
YACC Code :
%{
#include <stdio.h>
#include <string.h>
#define YYSTYPE int /* the attribute type for Yacc's stack */
extern int yylval; /* defined by lex, holds attrib of cur token */
extern char yytext[]; /* defined by lex and holds most recent token */
extern FILE * yyin; /* defined by lex; lex reads from this file */
%}
%token NUM
%%
Begin : Line
| Begin Line
;
Line : Calc {printf("%s",$$); }
;
Calc : Expr {printf("Result = %d\n",$1);}
Expr : Fact '+' Expr { $$ = $1 + $3; }
| Fact '-' Expr { $$ = $1 - $3; }
| Fact '*' Expr { $$ = $1 * $3; }
| Fact '/' Expr { $$ = $1 / $3; }
| Fact { $$ = $1; }
| '-' Expr { $$ = -$2; }
;
Fact : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id : NUM { $$ = yylval; }
;
%%
void yyerror(char *mesg); /* this one is required by YACC */
main(int argc, char* *argv){
char ch;
if(argc != 2) {printf("useage: calc filename \n"); exit(1);}
if( !(yyin = fopen(argv[1],"r")) ){
printf("cannot open file\n");exit(1);
}
yyparse();
}
void yyerror(char *mesg){
printf("Bad Expression : %s\n", mesg);
exit(1); /* stop after the first error */
}
LEX Code :
%{
#include <stdio.h>
#include "y.tab.h"
int yylval; /*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit [0-9]
num ({digit})*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n]
other .
%%
{ws} { /* note, no return */ }
{num} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { printf("bad%cbad%d\n",*yytext,*yytext); return '?'; }
%%
/* c functions called in the matching section could go here */
I am trying to print the expression along with result.
Thanks In Advance.
In your parser, you have:
Line : Calc {printf("%s",$$); }
Now $$ is the semantic value which the rule is computing, and you haven't assigned anything to it. So it would not be unreasonable to assume that it is undefined, which would be bad, but in fact it does have a value because of the default rule $$ = $1;. All the same, it would be much more readable to write
printf("%s", $1);
But that's not correct, is it? After all, you have
#define YYSTYPE int
so all semantic types are integers. But you're telling printf that $1 is a string (%s). printf will believe you, so it will go ahead and try to dereference the int as though it were a char*, with predictable results (i.e., a segfault).
You are probably using a compiler which is clever enough to notice the fact that you are trying to print an int with a %s format code. But either you haven't asked the compiler to help you or you are ignoring its advice.
Always compile with warnings enabled. If you are using gcc or clang, that means putting -Wall in the command line. (If you are using some other compiler, find out how to produce warnings. It will be documented.) And then read the warnings and fix them before trying to run the program.
There are several other errors and/or questionable practices in your code. Your grammar is inaccurate (why do you use fact as the left-hand operand of every operator?), and despite your comment, your lexical scanner ignores newline characters, so there is no way the parser can know whether expressions are one per line, two per line, or spread over multiple lines; that will make it hard to use the calculator as a command-line tool.
There is no need to define the lex macro digit; (f)lex recognizes the Posix character class [[:digit:]] (and others, documented here) automatically. Nor is it particularly useful to define the macro num. Overuse of lex macros makes your program harder to read; it is usually better to just write the patterns out in place:
[[:digit:]]+ { yylval = atoi(yytext); return NUM; }
which would be more readable and less work both for you and for anyone reading your code. (If your professor or tutor disagrees, I'd be happy to discuss the matter with them directly.)

Runtime "syntax error" from lex and yacc

I cannot figure out why I am getting these results.
++
+add
+syntax error 2
++
+add
+syntax error 4
The ++ is my input and lex echoes each character and yacc prints add whenever it gets a +. It's giving me this error on every other + it gets. Doesn't matter how I give the input, I get the same results if I hit enter on every +.
lex
%{
#include "y.tab.h"
int chars = 0;
%}
%%
"+" {ECHO; chars++; return ADD;}
. {ECHO; chars++;}
\n {ECHO;}
%%
yacc
%{
#include <stdio.h>
extern int chars;
void yyerror (const char *str) {
printf ("%s %d\n", str, chars);
}
%}
%token ADD
%%
symbol : ADD {printf ("add\n");}
;
%%
int main () {
while (1) {
yyparse ();
}
}
Your grammar only accepts a 'sentence' that consists of a single token, +. When you type a second +, you induce a syntax error; your grammar doesn't allow ADD followed by ADD. Your next token after the + must be EOF for the grammar to accept your input. (Because of the . and \n rules, you can type all sorts of other stuff at the code, but there can only be one + in the input.)

Lex & Yacc - Grammar rules

I am trying to learn lex and yacc.
I am struggling to understand how to do the grammar rules.
My file has already been defined like:
fd 3x00
bk 100
setc 100
int xy3 fd 10 rt 90
rt
My output with the printf and printing to a file went something like this:
Keyword: fd
Illegal: 3x00
Keyword: bk
Keyword: setc
Number: 100
Keyword: int
Id: xy3
Keyword: fd
Number: 10
Keyword: rt
Number: 90
Here is my lex file - im only going to show part of it to keep this post as small as possible
fd {return FD; }
[0-9]+[a-z]+[0-9]+ {} // this is the illegal entry 3x00
[\r\t\n]+ {}
bk {return BK;}
setc {return SETC;}
[-+]?[0-9]+ {yyval.ival = atoi(yytext); return NUMBER;}
int {fprintf(yyout, "%s\n", yytext);}
xy3 {fprintf(yyout, "%s\n", yytext);}
fd[0-9]+ {fprintf(yyout, "%s\n", yytext);}
%%
Here is my yacc file. It is not complete since i dont know how to finish it.
%{
#include <ctype.h>
#include <stdio.h>
%}
%token NUMBER
%token ID
%token FD
%token BK
%token SETC
%token KEYWORD
%%
%%
main()
{
yyparse()
}
I am not sure how i would write the grammar rules for these.
Can i make my own name for the expression?
can anyone help me with one example so i can see how to finish it?
The rules should be like this:
statement: command arg {printf("Keyword: %s\n", $1);};
command: KEYWORD {$$ = $1;}
|FD {$$ = $1;}
|BK {$$ = $1;};
arg: NUMBER {printf("Number: %s\n", $1);}
|ID {printf("Id: %s\n", $1);};
That means, you should define the syntactical rules in this way. Separate alternative definitions by |, and write the desired actions in a { } block for each rule. Finish each rule with a ;. When you refer to the the tokens, use $n where n is the position of the token in the rule. The rule header can be referred to using $$.

How do I handle newlines in a Bison Grammar, without allowing all characters?

I've gone right back to basics to try and understand how the parser can match an input line such as "asdf", or any other jumble of characters, where there is no rule defined for this.
My lexer:
%{
#include
%}
%%
"\n" {return NEWLINE; }
My Parser:
%{
#include <stdlib.h>
%}
% token NEWLINE
%%
program:
| program line
;
line: NEWLINE
;
%%
#include <stdio.h>
int yyerror(char *s)
{
printf("%s\n", s);
return(0);
}
int main(void)
{
yyparse();
exit(0);
}
It is my understanding that this, when compiled and run should accept nothing more than empty blank lines, but it will also allow any strings to be input without a syntax error.
What am I missing?
Thanks
Currently, your lexer echos and ignores all non-newline characters (that's the default action in lex for characters that don't match any rule), so the parser will only ever see newlines.
In general, your lexer needs to do something with any/every possible input character. It can ignore them (silently or with a message), or return tokens for the parser. The usual approach is to have the last lexer rule be:
. return *yytext;
which matches any single character (other than a newline) and sends it on to the parser as-is. This is the last rule, so that any earlier rule that matches a single character takes precedence,
This is completely independent of the parser, which only sees that part of the input the lexer gives it.
You have default rules. Add the option nodefault in order to solve your problem. Your lexer will then look like this instead:
%option nodefault
%{
#include <stdlib.h>
%}
%%
"\n" {return NEWLINE; }