"first use" error when change the code in lex file - yacc

Given a .l file like this:
%{
#include "y.tab.h"
%}
%%
[ \t\n]
"if" return IF_TOKEN ;
"while" return ELSE_TOKEN ;
. yyerror("Invalid Character");
%%
int yywrap(void){
return 1;
}
and a .y file like this:
%{
#include <stdio.h>
void yyerror(char *);
%}
%token IF_TOKEN ELSE_TOKEN MINUS_TOKEN DIGIT_TOKEN
%%
program :expr {printf("program Accepted!!!");};
expr : IF_TOKEN | DIGIT_TOKEN ;
%%
void yyerror(char *s){
fprintf(stderr, "%s\n", s);
}
int main(){
yyparse();
return 0;
}
I use these 3 commands to compile these 2 files (my lex file named p.l and my yacc file named p.y):
flex p.l
yacc -d p.y
gcc lex.yy.c y.tab.c
It is compiled with no error. But when I changed "return ELSE_TOKEN" to "return WHILE_TOKEN", I got this error and got no output file:
p.l: In function ‘yylex’:
p.l:10:8: error: ‘WHILE_TOKEN’ undeclared (first use in this function)
"while" return WHILE_TOKEN ;
^
p.l:10:8: note: each undeclared identifier is reported only once for each function it appears in
Also when I change "while" to "else" and add a new rule like:
"for" return FOR_TOKEN ;
I get the same error. How can I correct the code to work correctly?

You didn't add:
%token WHILE_TOKEN FOR_TOKEN
to the grammar, so the header didn't contain a definition for WHILE_TOKEN or FOR_TOKEN, so the compilation of the lexical analyzer failed.

Related

Simple Regex pattern unmatched with Flex/Bison (Lex/Yacc)

I have built a trivial compiler using Flex and Bison which is supposed to recognize a simple string in a source file and I use the standard error stream to output a message if the string is recognized correctly.
Below is my code and my unexpected result.
This is the source file (testsource.txt) with the string I try to recognize:
\end{document}
This is the Flex file (UnicTextLang.l):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
extern int yyparse();
%}
%%
^\\end\{document\}$ { yyerror("end matched"); return END; }
/* skip whitespace */
[ \t] ;
/* anything else is an error */
. yyerror("invalid character");
%%
int main(int argc, char *argv[]) {
if ( argc < 3 )
yyerror("You need 2 args: inputFileName outputFileName");
else {
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
This is the Bison file (UnicTextLang.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
%}
%token END
%%
document:
END
|
;
%%
int yywrap(void) {
return 1;
}
void yyerror(char *s) {
fprintf(stderr, "%s\n", s); /* Prints to the standard error stream */
}
I run the following commands:
flex UnicTextLang.l
bison -dl -o y.tab.c UnicTextLang.y
gcc lex.yy.c y.tab.c -o UnicTextLang
UnicTextLang.exe testsource.txt output.txt
What I expect to see printed in the console is
end matched
But this is what I get:
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
What’s wrong?
This issue is caused by the end-of-line code for a Windows machine being two characters (\r\n) when on other systems it is one (\n).
This is explained in the flex manual:
‘r$’
an ‘r’, but only at the end of a line (i.e., just before a newline). Equivalent to ‘r/\n’.
Note that flex’s notion of “newline” is exactly whatever the C compiler used to compile flex interprets ‘\n’ as; in particular, on some DOS systems you must either filter out ‘\r’s in the input yourself, or explicitly use ‘r/\r\n’ for ‘r$’.
The quick solution is to change:
^\\end\{document\}$
to
^\\end\{document\}\r\n
However, if your expression is at the end-of-file without an end-of-line, which is possible in Windows, then you would have to specifically match that case also. Flex does permit the matching of end-of-file with:
<<EOF>>
but this will cause all kinds of other side effects and it is often easier not to anchor the pattern to the end (of line or file).

y.tab.c and y.tab.h files are not generated on win64

I ran the following command in the Windows command prompt:
yacc -d calci.y
After successful execution it generates 2 files: calci.tab.c and calci.tab.h. But it should have generated y.tab.c and y.tab.h.
I am very new to lex and yacc, so I do not have an idea about the error.
Also, it gives me the following error when I try to run command:
cc lex.yy.c calci.tab.c -o out.exe:
error: calci.l:3:23: fatal error: y.tab.h: No such file or directory
compilation terminated.
Please give some suggestion.
yacc program:--->>
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
%}
%token INTEGER
%%
program:
program expr '\n' { printf("%d\n", $2); }
|
;
expr:
INTEGER
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
lex program:-->>>>
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+\n] { return *yytext; }
[ \t] ; /* skip whitespace */
. yyerror("Unknown character");
%%
int yywrap(void) {
return 1;
}
Just accept that bison will name its output files based on the name of its input file.
Creating files called y.tab.c and y.tab.h is the legacy behaviour of the original yacc tool; with current bison versions, you can achieve compatible behaviour by supplying the -y command-line option to bison. But I don't recommend doing that for new code; it will also change some details of the parser's behaviour in order to be legacy-compatible, and if you don't have legacy code those behaviours may not be desirable.
Basing the names of the bison-generated files on the input files makes it possible to have more than one bison source file in the same directory. If you don't want to use the name of the source file, you can specify an explicit output file name with the -o option (and the --defines option if you want the header file's name to have a different prefix than the source file).
All that means you need to change the name of the file being included into the lexer, so the line will become
#include "calci.tab.h"
(assuming you don't use the -o/--defines options.)

Runtime "syntax error" from lex and yacc

I cannot figure out why I am getting these results.
++
+add
+syntax error 2
++
+add
+syntax error 4
The ++ is my input and lex echoes each character and yacc prints add whenever it gets a +. It's giving me this error on every other + it gets. Doesn't matter how I give the input, I get the same results if I hit enter on every +.
lex
%{
#include "y.tab.h"
int chars = 0;
%}
%%
"+" {ECHO; chars++; return ADD;}
. {ECHO; chars++;}
\n {ECHO;}
%%
yacc
%{
#include <stdio.h>
extern int chars;
void yyerror (const char *str) {
printf ("%s %d\n", str, chars);
}
%}
%token ADD
%%
symbol : ADD {printf ("add\n");}
;
%%
int main () {
while (1) {
yyparse ();
}
}
Your grammar only accepts a 'sentence' that consists of a single token, +. When you type a second +, you induce a syntax error; your grammar doesn't allow ADD followed by ADD. Your next token after the + must be EOF for the grammar to accept your input. (Because of the . and \n rules, you can type all sorts of other stuff at the code, but there can only be one + in the input.)

why calles function yyerror() in sample programm

curs.l :
%{
#include <stdlib.h>
#include "tree.c"
#include "yycurs.h"
%}
L [a-zA-Z_]
D [0-9]
D4 [0-3]
IDENTIFIER ({L})({L}|{D})*
INT4 {D4}+'q'
INT {D}+
%%
{IDENTIFIER} {return VARIABLE;}
%%
int yywrap(void){
return 0;
}
curs.y:
%{
#include stdio.h
void yyerror(char*);
int yylex(void);
%}
%token VARIABLE INTEGER
%%
var: VARIABLE {printf($1);};
%%
void yyerror(char *s){
fprintf(stderr, "11\n");
fprintf(stderr, "%s\n", s);
}
int main(void){
yyparse();
return 0;
}
when i run my compiled progrum, i have such result:
./curs
ff //I introduced
//result
ff //I introduced
11 //result
syntax error //result
evgeniy#evgeniy-desktop:~/documents/compilers$
Can anybody explain me, why there appears 'syntax error'?
Thanks in advance.
Your grammar defiles that a valid file consists of exactly one VARIABLE. To have more then one, you need to introduce a recursive rule.
%start vars
%%
var: VARIABLE {printf($1);};
vars: var
| vars var;
%%

Why am I getting conflicts: 1 shift/reduce

I'm new to bison and I'm getting a "conflicts: 1 shift/reduce" error. Can anyone shed some light on this?
Here's the y file.
test.y:
%{
#include <stdio.h>
#include <string.h>
#define YYERROR_VERBOSE
#define YYDEBUG 1
void yyerror(const char *str);
int yywrap();
%}
%union
{
int integer;
char *string;
}
%token <string> VAR_LOCAL
%token <integer> LIT_NUMBER
%token <string> LIT_STRING
%token WS_LINEBRK
//%token SYMB_EQL
%token SYMB_PLUS
%token SYMB_MINUS
%token SYMB_MUL
%token SYMB_DIV
%%
/*
// Sample input
num = 10
str = "this is a string"
*/
inputs: /* empty token */
| literal
| variable
| inputs stmt WS_LINEBRK
;
stmt: variable "=" exps
;
exps: variable op literal
| variable op variable
| literal op literal
| literal op variable
;
op: SYMB_PLUS | SYMB_MINUS | SYMB_MUL | SYMB_DIV ;
variable: VAR_LOCAL
{
printf("variable: %s\n", $1);
}
;
literal:
number | string
;
string: LIT_STRING
{
printf("word: %s\n", $1);
}
;
number: LIT_NUMBER
{
printf("number: %d\n", $1);
}
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
}
Here's the lex file
test.l:
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
int line_no = 0;
%}
%%
[a-z][a-zA-Z0-9]* {
// local variable
yylval.string=strdup(yytext);
return VAR_LOCAL;
}
[0-9]+ {
//number literal
yylval.integer=atoi(yytext);
return LIT_NUMBER;
}
= return SYMB_EQL;
\+ return SYMB_PLUS;
\- return SYMB_MINUS;
\* return SYMB_MUL;
\/ return SYMB_DIV;
\"[-+\!\.a-zA-Z0-9' ]+\" {
// word literal
yylval.string=strdup(yytext);
return LIT_STRING;
}
\n {
// line break
printf("\n");
return WS_LINEBRK;
}
[ \t]+ /* ignore whitespace */;
%%
bison -r test.y will write a file test.output with a detailed description of the generated state machine that allows you to see what's going on - such as the state where the shift/reduce conflict occurs.
In your case, the problem is in the start state (corresponding to your start nonterminal, inputs). Say the first token is VAR_LOCAL. There's two things your parser could do:
It could match the variable case.
It could also match the inputs stmt WS_LINEBRK case: inputs matches the empty string (first line), and stmt matches variable "=" exps.
With the one token of lookahead that bison parsers use, there's no way to tell. You need to change your grammar to get rid of this case.
To fix the grammar, as Fabian has suggested, move the variable and literal to the end of exps from inputs
inputs:
| variable
| literal
exps:
...
| variable
| literal
That allows x= y,x="aliteral" syntax.
To allow for empty input lines, change the /* empty token */ rule to WS_LINEBREAK:
inputs: WS_LINEBRK
| stmt WS_LINEBRK
| inputs stmt WS_LINEBRK
;
On another note, since the scanner still looks for the SYMB_ EQUAL ; but the parser no longer defines it (its commented out), something needs to be done in order to compile. One option is to uncomment the %token definition and use SYMB_ EQUAL instead of the literal "=" in the parser .y file.