how to parse from command line arguements in yacc ?
of course i undefined input in both lex & yacc and then wrote
int input(void)
{
printf("in input\n:");
char c;
if(target > limit)
return 0;
if((c = target[0][offset++]) != '\0')
return (c);
target++;
offset =0;
return (' ');
}
where target contains the command line arguements. But only the standard input is getting excueted how to make dis input function get executed.
Did you mean you want your generates parser accept command line arguments? Then you need to add those arguments to the main function. The lexer input is called FILE* yyin, and is initialized to stdin in the lexer. You can change the default behavior by
#include <stdio.h>
extern FILE* yyin;
int main(int argv, char** argv)
{
if(argc==2)
{
yyin = fopen(argv[1], "r");
if(!yyin)
{
fprintf(stderr, "can't read file %s\n", argv[1]);
return 1;
}
}
yyparse();
}
If you want your own function to be executed instead of the one provided by flex, you need to define the YY_INPUT macro.
Related
I am new to compilers and learning to make calculator that inputs multiple line equations (one equation each line) from a .txt file. And I am facing the problem of segmentation fault.
YACC Code :
%{
#include <stdio.h>
#include <string.h>
#define YYSTYPE int /* the attribute type for Yacc's stack */
extern int yylval; /* defined by lex, holds attrib of cur token */
extern char yytext[]; /* defined by lex and holds most recent token */
extern FILE * yyin; /* defined by lex; lex reads from this file */
%}
%token NUM
%%
Begin : Line
| Begin Line
;
Line : Calc {printf("%s",$$); }
;
Calc : Expr {printf("Result = %d\n",$1);}
Expr : Fact '+' Expr { $$ = $1 + $3; }
| Fact '-' Expr { $$ = $1 - $3; }
| Fact '*' Expr { $$ = $1 * $3; }
| Fact '/' Expr { $$ = $1 / $3; }
| Fact { $$ = $1; }
| '-' Expr { $$ = -$2; }
;
Fact : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id : NUM { $$ = yylval; }
;
%%
void yyerror(char *mesg); /* this one is required by YACC */
main(int argc, char* *argv){
char ch;
if(argc != 2) {printf("useage: calc filename \n"); exit(1);}
if( !(yyin = fopen(argv[1],"r")) ){
printf("cannot open file\n");exit(1);
}
yyparse();
}
void yyerror(char *mesg){
printf("Bad Expression : %s\n", mesg);
exit(1); /* stop after the first error */
}
LEX Code :
%{
#include <stdio.h>
#include "y.tab.h"
int yylval; /*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit [0-9]
num ({digit})*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n]
other .
%%
{ws} { /* note, no return */ }
{num} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { printf("bad%cbad%d\n",*yytext,*yytext); return '?'; }
%%
/* c functions called in the matching section could go here */
I am trying to print the expression along with result.
Thanks In Advance.
In your parser, you have:
Line : Calc {printf("%s",$$); }
Now $$ is the semantic value which the rule is computing, and you haven't assigned anything to it. So it would not be unreasonable to assume that it is undefined, which would be bad, but in fact it does have a value because of the default rule $$ = $1;. All the same, it would be much more readable to write
printf("%s", $1);
But that's not correct, is it? After all, you have
#define YYSTYPE int
so all semantic types are integers. But you're telling printf that $1 is a string (%s). printf will believe you, so it will go ahead and try to dereference the int as though it were a char*, with predictable results (i.e., a segfault).
You are probably using a compiler which is clever enough to notice the fact that you are trying to print an int with a %s format code. But either you haven't asked the compiler to help you or you are ignoring its advice.
Always compile with warnings enabled. If you are using gcc or clang, that means putting -Wall in the command line. (If you are using some other compiler, find out how to produce warnings. It will be documented.) And then read the warnings and fix them before trying to run the program.
There are several other errors and/or questionable practices in your code. Your grammar is inaccurate (why do you use fact as the left-hand operand of every operator?), and despite your comment, your lexical scanner ignores newline characters, so there is no way the parser can know whether expressions are one per line, two per line, or spread over multiple lines; that will make it hard to use the calculator as a command-line tool.
There is no need to define the lex macro digit; (f)lex recognizes the Posix character class [[:digit:]] (and others, documented here) automatically. Nor is it particularly useful to define the macro num. Overuse of lex macros makes your program harder to read; it is usually better to just write the patterns out in place:
[[:digit:]]+ { yylval = atoi(yytext); return NUM; }
which would be more readable and less work both for you and for anyone reading your code. (If your professor or tutor disagrees, I'd be happy to discuss the matter with them directly.)
I have built a trivial compiler using Flex and Bison which is supposed to recognize a simple string in a source file and I use the standard error stream to output a message if the string is recognized correctly.
Below is my code and my unexpected result.
This is the source file (testsource.txt) with the string I try to recognize:
\end{document}
This is the Flex file (UnicTextLang.l):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
extern int yyparse();
%}
%%
^\\end\{document\}$ { yyerror("end matched"); return END; }
/* skip whitespace */
[ \t] ;
/* anything else is an error */
. yyerror("invalid character");
%%
int main(int argc, char *argv[]) {
if ( argc < 3 )
yyerror("You need 2 args: inputFileName outputFileName");
else {
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
This is the Bison file (UnicTextLang.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
%}
%token END
%%
document:
END
|
;
%%
int yywrap(void) {
return 1;
}
void yyerror(char *s) {
fprintf(stderr, "%s\n", s); /* Prints to the standard error stream */
}
I run the following commands:
flex UnicTextLang.l
bison -dl -o y.tab.c UnicTextLang.y
gcc lex.yy.c y.tab.c -o UnicTextLang
UnicTextLang.exe testsource.txt output.txt
What I expect to see printed in the console is
end matched
But this is what I get:
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
What’s wrong?
This issue is caused by the end-of-line code for a Windows machine being two characters (\r\n) when on other systems it is one (\n).
This is explained in the flex manual:
‘r$’
an ‘r’, but only at the end of a line (i.e., just before a newline). Equivalent to ‘r/\n’.
Note that flex’s notion of “newline” is exactly whatever the C compiler used to compile flex interprets ‘\n’ as; in particular, on some DOS systems you must either filter out ‘\r’s in the input yourself, or explicitly use ‘r/\r\n’ for ‘r$’.
The quick solution is to change:
^\\end\{document\}$
to
^\\end\{document\}\r\n
However, if your expression is at the end-of-file without an end-of-line, which is possible in Windows, then you would have to specifically match that case also. Flex does permit the matching of end-of-file with:
<<EOF>>
but this will cause all kinds of other side effects and it is often easier not to anchor the pattern to the end (of line or file).
I ran the following command in the Windows command prompt:
yacc -d calci.y
After successful execution it generates 2 files: calci.tab.c and calci.tab.h. But it should have generated y.tab.c and y.tab.h.
I am very new to lex and yacc, so I do not have an idea about the error.
Also, it gives me the following error when I try to run command:
cc lex.yy.c calci.tab.c -o out.exe:
error: calci.l:3:23: fatal error: y.tab.h: No such file or directory
compilation terminated.
Please give some suggestion.
yacc program:--->>
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
%}
%token INTEGER
%%
program:
program expr '\n' { printf("%d\n", $2); }
|
;
expr:
INTEGER
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
lex program:-->>>>
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+\n] { return *yytext; }
[ \t] ; /* skip whitespace */
. yyerror("Unknown character");
%%
int yywrap(void) {
return 1;
}
Just accept that bison will name its output files based on the name of its input file.
Creating files called y.tab.c and y.tab.h is the legacy behaviour of the original yacc tool; with current bison versions, you can achieve compatible behaviour by supplying the -y command-line option to bison. But I don't recommend doing that for new code; it will also change some details of the parser's behaviour in order to be legacy-compatible, and if you don't have legacy code those behaviours may not be desirable.
Basing the names of the bison-generated files on the input files makes it possible to have more than one bison source file in the same directory. If you don't want to use the name of the source file, you can specify an explicit output file name with the -o option (and the --defines option if you want the header file's name to have a different prefix than the source file).
All that means you need to change the name of the file being included into the lexer, so the line will become
#include "calci.tab.h"
(assuming you don't use the -o/--defines options.)
Given a .l file like this:
%{
#include "y.tab.h"
%}
%%
[ \t\n]
"if" return IF_TOKEN ;
"while" return ELSE_TOKEN ;
. yyerror("Invalid Character");
%%
int yywrap(void){
return 1;
}
and a .y file like this:
%{
#include <stdio.h>
void yyerror(char *);
%}
%token IF_TOKEN ELSE_TOKEN MINUS_TOKEN DIGIT_TOKEN
%%
program :expr {printf("program Accepted!!!");};
expr : IF_TOKEN | DIGIT_TOKEN ;
%%
void yyerror(char *s){
fprintf(stderr, "%s\n", s);
}
int main(){
yyparse();
return 0;
}
I use these 3 commands to compile these 2 files (my lex file named p.l and my yacc file named p.y):
flex p.l
yacc -d p.y
gcc lex.yy.c y.tab.c
It is compiled with no error. But when I changed "return ELSE_TOKEN" to "return WHILE_TOKEN", I got this error and got no output file:
p.l: In function ‘yylex’:
p.l:10:8: error: ‘WHILE_TOKEN’ undeclared (first use in this function)
"while" return WHILE_TOKEN ;
^
p.l:10:8: note: each undeclared identifier is reported only once for each function it appears in
Also when I change "while" to "else" and add a new rule like:
"for" return FOR_TOKEN ;
I get the same error. How can I correct the code to work correctly?
You didn't add:
%token WHILE_TOKEN FOR_TOKEN
to the grammar, so the header didn't contain a definition for WHILE_TOKEN or FOR_TOKEN, so the compilation of the lexical analyzer failed.
I have recently tried using GNU Bison and Flex to write a interpreter. The text I want the interpreter to recognize is print "Hello" and I have tried the following:
flex file:
%{
#include <iostream>
using namespace std;
#define YY_DECL extern "C" int yylex()
#include "gbison.tab.h"
%}
%%
[ \t\n] ;
'\"' return QUOTE;
[a-zA-Z0-9]+ { yylval.sval = strdup(yytext); return STRING; }
%%
bison file:
%{
#include <cstdio>
#include <cstring>
#include <iostream>
using namespace std;
extern "C" int yylex();
extern "C" int yyparse();
extern "C" FILE* yyin;
void yyerror (const char* s);
%}
%union {
char* sval;
}
%token <sval> STRING
%token QUOTE
%%
str:
STRING QUOTE STRING QUOTE
{
if (strcmp($1, "print") == 0)
{
cout << $3 << flush;
}
if (strcmp($1, "println") == 0)
{
cout << $3 << endl;
}
}
;
%%
main(int argc, char* argv[])
{
FILE* input = fopen(argv[1], "r");
if (!input)
{
cout << "Bad input. Nonexistant file" << endl;
return -1;
}
yyin = input;
do
{
yyparse();
} while (!feof(yyin));
}
void yyerror(const char* s)
{
cout << "Error. " << s << endl;
exit(-1);
}
But when I pass print "hello" to the compiled program I get:
"Error. syntax error
I think that the issue is the STRING QUOTE STRING QUOTE but I am not sure. What is exactly is going wrong? How would I get the interpreter to print hello?
The answers are below, but I hope the following is more generally useful, as fishing instruction.
There are a variety of debugging tools which would help you. In particular, flex provides the -d flag:
-d, --debug
makes the generated scanner run in "debug" mode. Whenever a pattern is recognized and the global variable yy_flex_debug is non-zero (which is the default), the scanner will write to stderr a line… (flex manual)
bison also provides a debug facility. (bison manual)
There are several means to enable compilation of trace facilities:
the macro YYDEBUG…
the option -t (POSIX Yacc compliant)…
the option --debug (Bison extension)…
the directive %debug…
We suggest that you always enable the debug option so that debugging
is always possible.
…
Once you have compiled the program with trace facilities, the way to
request a trace is to store a nonzero value in the variable yydebug.
You can do this by making the C code do it (in main, perhaps), or you
can alter the value with a C debugger.
Also, remember that flex inserts an automatic rule which causes any otherwise unrecognized character to be echoed to the output. ("By default, any text not matched by a flex scanner is copied to the output" -- Some simple examples) That's why you have the extra " in the error message being printed by your program:
"Error. syntax error
^
That's a bit subtle, though. Tracing flex would have shown you that more directly.
So, finally, the problem(s):
The flex pattern '\"' does not match a ". It matches '"', because single quotes are not special to flex. That's definitely why your parse fails.
Fixing that will let your program parse a single command, but it will generate a syntax error if you try to give it two print commands in the same input. That's because bison always parses until it receives an END token from the lexer, and the lexer (by default) only provides an END token when it reaches the end of the input. You can change
the lexer behaviour (by sending END in other circumstances, for example a new-line) (not recommended)
the parser behaviour (by using ACCEPT) (possible, but rarely necessary)
the grammar, so that it recognizes any number of statements. (recommended)