I have recently tried using GNU Bison and Flex to write a interpreter. The text I want the interpreter to recognize is print "Hello" and I have tried the following:
flex file:
%{
#include <iostream>
using namespace std;
#define YY_DECL extern "C" int yylex()
#include "gbison.tab.h"
%}
%%
[ \t\n] ;
'\"' return QUOTE;
[a-zA-Z0-9]+ { yylval.sval = strdup(yytext); return STRING; }
%%
bison file:
%{
#include <cstdio>
#include <cstring>
#include <iostream>
using namespace std;
extern "C" int yylex();
extern "C" int yyparse();
extern "C" FILE* yyin;
void yyerror (const char* s);
%}
%union {
char* sval;
}
%token <sval> STRING
%token QUOTE
%%
str:
STRING QUOTE STRING QUOTE
{
if (strcmp($1, "print") == 0)
{
cout << $3 << flush;
}
if (strcmp($1, "println") == 0)
{
cout << $3 << endl;
}
}
;
%%
main(int argc, char* argv[])
{
FILE* input = fopen(argv[1], "r");
if (!input)
{
cout << "Bad input. Nonexistant file" << endl;
return -1;
}
yyin = input;
do
{
yyparse();
} while (!feof(yyin));
}
void yyerror(const char* s)
{
cout << "Error. " << s << endl;
exit(-1);
}
But when I pass print "hello" to the compiled program I get:
"Error. syntax error
I think that the issue is the STRING QUOTE STRING QUOTE but I am not sure. What is exactly is going wrong? How would I get the interpreter to print hello?
The answers are below, but I hope the following is more generally useful, as fishing instruction.
There are a variety of debugging tools which would help you. In particular, flex provides the -d flag:
-d, --debug
makes the generated scanner run in "debug" mode. Whenever a pattern is recognized and the global variable yy_flex_debug is non-zero (which is the default), the scanner will write to stderr a line… (flex manual)
bison also provides a debug facility. (bison manual)
There are several means to enable compilation of trace facilities:
the macro YYDEBUG…
the option -t (POSIX Yacc compliant)…
the option --debug (Bison extension)…
the directive %debug…
We suggest that you always enable the debug option so that debugging
is always possible.
…
Once you have compiled the program with trace facilities, the way to
request a trace is to store a nonzero value in the variable yydebug.
You can do this by making the C code do it (in main, perhaps), or you
can alter the value with a C debugger.
Also, remember that flex inserts an automatic rule which causes any otherwise unrecognized character to be echoed to the output. ("By default, any text not matched by a flex scanner is copied to the output" -- Some simple examples) That's why you have the extra " in the error message being printed by your program:
"Error. syntax error
^
That's a bit subtle, though. Tracing flex would have shown you that more directly.
So, finally, the problem(s):
The flex pattern '\"' does not match a ". It matches '"', because single quotes are not special to flex. That's definitely why your parse fails.
Fixing that will let your program parse a single command, but it will generate a syntax error if you try to give it two print commands in the same input. That's because bison always parses until it receives an END token from the lexer, and the lexer (by default) only provides an END token when it reaches the end of the input. You can change
the lexer behaviour (by sending END in other circumstances, for example a new-line) (not recommended)
the parser behaviour (by using ACCEPT) (possible, but rarely necessary)
the grammar, so that it recognizes any number of statements. (recommended)
Related
I am trying to write a very simple calculator program using lex and yacc but getting stuck in printing the output. The files are:
calc.l:
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {yylval = atoi(yytext); return NUMBER;}
[ \t] ;
\n return 0;
. return yytext[0];
%%
calc.y:
%{
#include <stdio.h>
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf(" =%d\n", $1);}
;
expression: expression '+' NUMBER {$$ = $1 + $3;}
| expression '-' NUMBER {$$ = $1 - $3;}
| NUMBER {$$ = $1;}
;
The commands I have used:
flex calc.l
bison calc.y -d
gcc lex.yy.c calc.tab.c -lfl
./a.out
After running the last command although the program takes input from the keyboard but does not print anything, simply terminates. I didn't get any warning or error while compiling but it doesn't give any output. Please help.
You have no definition of main, so the main function in -lfl will be used. That library is for flex programs, and its main function will call yylex -- the lexical scanner -- until it returns 0.
You need to call the parser. Furthermore, you need to call it repeatedly, because your lexical scanner returns 0, indicating end of input, every time it reads a newline.
So you might use something like this:
int main(void) {
do {
yyparse();
} while (!feof(stdin));
return 0;
}
However, that will reveal some other problems. Most irritatingly, your grammar will not accept an empty input, so an empty line will trigger a syntax error. That will certainly happen at the end of the input, because the EOF will cause yylex to return 0 immediately, which is indistinguishable from an empty line.
Also, any error encountered during the parse will cause the parse to terminate immediately, leaving the remainder of the input line unread.
On the whole, it is often better for the scanner to return a newline token (or \n) for newline characters.
Other than the main function which you don't require, the only thing in -lfl is a default definition of yywrap. You could just define this function yourself (it only needs to return 1), or you could avoid the need for the function by adding
%option noyywrap
to your flex file. In fact, I usually recommend
%option noyywrap noinput nounput
which will avoid the compiler warnings (which you didn't see because you didn't supply -Wall when you compiled the program, which you should do.)
Another compiler warning will be avoided by adding a declaration of yylex to your bison input file before the definition of yyerror:
int yylex(void);
Finally, yylval is declared in y.tab.h, so there is no need for extern int yylval; in your flex file. In this case, it doesn't hurt, but if you change the type of the semantic value, which you will probably eventually want to do, this line will need to be changed as well. Better to just eliminate it.
I have built a trivial compiler using Flex and Bison which is supposed to recognize a simple string in a source file and I use the standard error stream to output a message if the string is recognized correctly.
Below is my code and my unexpected result.
This is the source file (testsource.txt) with the string I try to recognize:
\end{document}
This is the Flex file (UnicTextLang.l):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
extern int yyparse();
%}
%%
^\\end\{document\}$ { yyerror("end matched"); return END; }
/* skip whitespace */
[ \t] ;
/* anything else is an error */
. yyerror("invalid character");
%%
int main(int argc, char *argv[]) {
if ( argc < 3 )
yyerror("You need 2 args: inputFileName outputFileName");
else {
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
This is the Bison file (UnicTextLang.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
%}
%token END
%%
document:
END
|
;
%%
int yywrap(void) {
return 1;
}
void yyerror(char *s) {
fprintf(stderr, "%s\n", s); /* Prints to the standard error stream */
}
I run the following commands:
flex UnicTextLang.l
bison -dl -o y.tab.c UnicTextLang.y
gcc lex.yy.c y.tab.c -o UnicTextLang
UnicTextLang.exe testsource.txt output.txt
What I expect to see printed in the console is
end matched
But this is what I get:
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
What’s wrong?
This issue is caused by the end-of-line code for a Windows machine being two characters (\r\n) when on other systems it is one (\n).
This is explained in the flex manual:
‘r$’
an ‘r’, but only at the end of a line (i.e., just before a newline). Equivalent to ‘r/\n’.
Note that flex’s notion of “newline” is exactly whatever the C compiler used to compile flex interprets ‘\n’ as; in particular, on some DOS systems you must either filter out ‘\r’s in the input yourself, or explicitly use ‘r/\r\n’ for ‘r$’.
The quick solution is to change:
^\\end\{document\}$
to
^\\end\{document\}\r\n
However, if your expression is at the end-of-file without an end-of-line, which is possible in Windows, then you would have to specifically match that case also. Flex does permit the matching of end-of-file with:
<<EOF>>
but this will cause all kinds of other side effects and it is often easier not to anchor the pattern to the end (of line or file).
I ran the following command in the Windows command prompt:
yacc -d calci.y
After successful execution it generates 2 files: calci.tab.c and calci.tab.h. But it should have generated y.tab.c and y.tab.h.
I am very new to lex and yacc, so I do not have an idea about the error.
Also, it gives me the following error when I try to run command:
cc lex.yy.c calci.tab.c -o out.exe:
error: calci.l:3:23: fatal error: y.tab.h: No such file or directory
compilation terminated.
Please give some suggestion.
yacc program:--->>
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
%}
%token INTEGER
%%
program:
program expr '\n' { printf("%d\n", $2); }
|
;
expr:
INTEGER
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
lex program:-->>>>
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+\n] { return *yytext; }
[ \t] ; /* skip whitespace */
. yyerror("Unknown character");
%%
int yywrap(void) {
return 1;
}
Just accept that bison will name its output files based on the name of its input file.
Creating files called y.tab.c and y.tab.h is the legacy behaviour of the original yacc tool; with current bison versions, you can achieve compatible behaviour by supplying the -y command-line option to bison. But I don't recommend doing that for new code; it will also change some details of the parser's behaviour in order to be legacy-compatible, and if you don't have legacy code those behaviours may not be desirable.
Basing the names of the bison-generated files on the input files makes it possible to have more than one bison source file in the same directory. If you don't want to use the name of the source file, you can specify an explicit output file name with the -o option (and the --defines option if you want the header file's name to have a different prefix than the source file).
All that means you need to change the name of the file being included into the lexer, so the line will become
#include "calci.tab.h"
(assuming you don't use the -o/--defines options.)
I cannot figure out why I am getting these results.
++
+add
+syntax error 2
++
+add
+syntax error 4
The ++ is my input and lex echoes each character and yacc prints add whenever it gets a +. It's giving me this error on every other + it gets. Doesn't matter how I give the input, I get the same results if I hit enter on every +.
lex
%{
#include "y.tab.h"
int chars = 0;
%}
%%
"+" {ECHO; chars++; return ADD;}
. {ECHO; chars++;}
\n {ECHO;}
%%
yacc
%{
#include <stdio.h>
extern int chars;
void yyerror (const char *str) {
printf ("%s %d\n", str, chars);
}
%}
%token ADD
%%
symbol : ADD {printf ("add\n");}
;
%%
int main () {
while (1) {
yyparse ();
}
}
Your grammar only accepts a 'sentence' that consists of a single token, +. When you type a second +, you induce a syntax error; your grammar doesn't allow ADD followed by ADD. Your next token after the + must be EOF for the grammar to accept your input. (Because of the . and \n rules, you can type all sorts of other stuff at the code, but there can only be one + in the input.)
I have a lexical analyser written in flex that passes tokens to my parser written in bison.
The following is a small part of my lexer:
ID [a-z][a-z0-9]*
%%
rule {
printf("A rule: %s\n", yytext);
return RULE;
}
{ID} {
printf( "An identifier: %s\n", yytext );
return ID;
}
"(" return LEFT;
")" return RIGHT;
There are other bits for parsing whitespace etc too.
Then part of the parser looks like this:
%{
#include <stdio.h>
#include <stdlib.h>
#define YYSTYPE char*
%}
%token ID RULE
%token LEFT RIGHT
%%
rule_decl :
RULE LEFT ID RIGHT { printf("Parsing a rule, its identifier is: %s\n", $2); }
;
%%
It's all working fine but I just want to print out the ID token using printf - that's all :). I'm not writing a compiler.. it's just that flex/bison are good tools for my software. How are you meant to print tokens? I just get (null) when I print.
Thank you.
I'm not an expert at yacc, but the way I've been handling the transition from the lexer to the parser is as follows: for each lexer token, you should have a separate rule to "translate" the yytext into a suitable form for your parser. In your case, you are probably just interested in yytext itself (while if you were writing a compiler, you'd wrap it in a SyntaxNode object or something like that). Try
%token ID RULE
%token LEFT RIGHT
%%
rule_decl:
RULE LEFT id RIGHT { printf("%s\n", $3); }
id:
ID { $$ = strdup(yytext); }
The point is that the last rule makes yytext available as a $ variable that can be referenced by rules involving id.