curs.l :
%{
#include <stdlib.h>
#include "tree.c"
#include "yycurs.h"
%}
L [a-zA-Z_]
D [0-9]
D4 [0-3]
IDENTIFIER ({L})({L}|{D})*
INT4 {D4}+'q'
INT {D}+
%%
{IDENTIFIER} {return VARIABLE;}
%%
int yywrap(void){
return 0;
}
curs.y:
%{
#include stdio.h
void yyerror(char*);
int yylex(void);
%}
%token VARIABLE INTEGER
%%
var: VARIABLE {printf($1);};
%%
void yyerror(char *s){
fprintf(stderr, "11\n");
fprintf(stderr, "%s\n", s);
}
int main(void){
yyparse();
return 0;
}
when i run my compiled progrum, i have such result:
./curs
ff //I introduced
//result
ff //I introduced
11 //result
syntax error //result
evgeniy#evgeniy-desktop:~/documents/compilers$
Can anybody explain me, why there appears 'syntax error'?
Thanks in advance.
Your grammar defiles that a valid file consists of exactly one VARIABLE. To have more then one, you need to introduce a recursive rule.
%start vars
%%
var: VARIABLE {printf($1);};
vars: var
| vars var;
%%
Related
I am working with lex and yacc, and I wonder how can I pass a value from $1 to a variable in c, in order to print it on the main?
file.y:
%{
#include <stdio.h>
%}
...
%token<num> NUM
%%
expression : NUM { printf("number:%d;\n", $1);}
;
%%
int main(){
yyparse();
printf("number is:%d;\n", var);
return 0;
}
doing something like '$1=var' to pass the value to the variable?
I have built a trivial compiler using Flex and Bison which is supposed to recognize a simple string in a source file and I use the standard error stream to output a message if the string is recognized correctly.
Below is my code and my unexpected result.
This is the source file (testsource.txt) with the string I try to recognize:
\end{document}
This is the Flex file (UnicTextLang.l):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
extern int yyparse();
%}
%%
^\\end\{document\}$ { yyerror("end matched"); return END; }
/* skip whitespace */
[ \t] ;
/* anything else is an error */
. yyerror("invalid character");
%%
int main(int argc, char *argv[]) {
if ( argc < 3 )
yyerror("You need 2 args: inputFileName outputFileName");
else {
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
This is the Bison file (UnicTextLang.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
int yylex(void);
/* "Connect" with the output file */
extern FILE *yyout;
%}
%token END
%%
document:
END
|
;
%%
int yywrap(void) {
return 1;
}
void yyerror(char *s) {
fprintf(stderr, "%s\n", s); /* Prints to the standard error stream */
}
I run the following commands:
flex UnicTextLang.l
bison -dl -o y.tab.c UnicTextLang.y
gcc lex.yy.c y.tab.c -o UnicTextLang
UnicTextLang.exe testsource.txt output.txt
What I expect to see printed in the console is
end matched
But this is what I get:
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
invalid character
What’s wrong?
This issue is caused by the end-of-line code for a Windows machine being two characters (\r\n) when on other systems it is one (\n).
This is explained in the flex manual:
‘r$’
an ‘r’, but only at the end of a line (i.e., just before a newline). Equivalent to ‘r/\n’.
Note that flex’s notion of “newline” is exactly whatever the C compiler used to compile flex interprets ‘\n’ as; in particular, on some DOS systems you must either filter out ‘\r’s in the input yourself, or explicitly use ‘r/\r\n’ for ‘r$’.
The quick solution is to change:
^\\end\{document\}$
to
^\\end\{document\}\r\n
However, if your expression is at the end-of-file without an end-of-line, which is possible in Windows, then you would have to specifically match that case also. Flex does permit the matching of end-of-file with:
<<EOF>>
but this will cause all kinds of other side effects and it is often easier not to anchor the pattern to the end (of line or file).
I am writing a YACC program defining the CFG for vowels in the given string, My code attempt is as follows
%{
#include <stdio.h>
%}
%union{
char c;
}
%token <c> VOW
%%
cha : 'a' { printf("a\n"); }
| 'e' {printf("e\n");}
| 'i' {printf("i\n");}
| 'o' {printf("o\n");}
| 'u' {printf("u\n");}
;
%%
int main(void) {return yyparse();}
int yylex(void) {return getchar();}
void yyerror(char *s) {fprintf(stderr, "%s\n",s);}
Is this a correct definition of a CFG for vowels
You don't need a context-free grammar for your problem, only a regular expresion. You're using the wrong tool for the job. It is three lines in flex(1):
%%
[aeiou] printf("%\n", yytext);
.|\n ;
Given a .l file like this:
%{
#include "y.tab.h"
%}
%%
[ \t\n]
"if" return IF_TOKEN ;
"while" return ELSE_TOKEN ;
. yyerror("Invalid Character");
%%
int yywrap(void){
return 1;
}
and a .y file like this:
%{
#include <stdio.h>
void yyerror(char *);
%}
%token IF_TOKEN ELSE_TOKEN MINUS_TOKEN DIGIT_TOKEN
%%
program :expr {printf("program Accepted!!!");};
expr : IF_TOKEN | DIGIT_TOKEN ;
%%
void yyerror(char *s){
fprintf(stderr, "%s\n", s);
}
int main(){
yyparse();
return 0;
}
I use these 3 commands to compile these 2 files (my lex file named p.l and my yacc file named p.y):
flex p.l
yacc -d p.y
gcc lex.yy.c y.tab.c
It is compiled with no error. But when I changed "return ELSE_TOKEN" to "return WHILE_TOKEN", I got this error and got no output file:
p.l: In function ‘yylex’:
p.l:10:8: error: ‘WHILE_TOKEN’ undeclared (first use in this function)
"while" return WHILE_TOKEN ;
^
p.l:10:8: note: each undeclared identifier is reported only once for each function it appears in
Also when I change "while" to "else" and add a new rule like:
"for" return FOR_TOKEN ;
I get the same error. How can I correct the code to work correctly?
You didn't add:
%token WHILE_TOKEN FOR_TOKEN
to the grammar, so the header didn't contain a definition for WHILE_TOKEN or FOR_TOKEN, so the compilation of the lexical analyzer failed.
I'm new to bison and I'm getting a "conflicts: 1 shift/reduce" error. Can anyone shed some light on this?
Here's the y file.
test.y:
%{
#include <stdio.h>
#include <string.h>
#define YYERROR_VERBOSE
#define YYDEBUG 1
void yyerror(const char *str);
int yywrap();
%}
%union
{
int integer;
char *string;
}
%token <string> VAR_LOCAL
%token <integer> LIT_NUMBER
%token <string> LIT_STRING
%token WS_LINEBRK
//%token SYMB_EQL
%token SYMB_PLUS
%token SYMB_MINUS
%token SYMB_MUL
%token SYMB_DIV
%%
/*
// Sample input
num = 10
str = "this is a string"
*/
inputs: /* empty token */
| literal
| variable
| inputs stmt WS_LINEBRK
;
stmt: variable "=" exps
;
exps: variable op literal
| variable op variable
| literal op literal
| literal op variable
;
op: SYMB_PLUS | SYMB_MINUS | SYMB_MUL | SYMB_DIV ;
variable: VAR_LOCAL
{
printf("variable: %s\n", $1);
}
;
literal:
number | string
;
string: LIT_STRING
{
printf("word: %s\n", $1);
}
;
number: LIT_NUMBER
{
printf("number: %d\n", $1);
}
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
}
Here's the lex file
test.l:
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
int line_no = 0;
%}
%%
[a-z][a-zA-Z0-9]* {
// local variable
yylval.string=strdup(yytext);
return VAR_LOCAL;
}
[0-9]+ {
//number literal
yylval.integer=atoi(yytext);
return LIT_NUMBER;
}
= return SYMB_EQL;
\+ return SYMB_PLUS;
\- return SYMB_MINUS;
\* return SYMB_MUL;
\/ return SYMB_DIV;
\"[-+\!\.a-zA-Z0-9' ]+\" {
// word literal
yylval.string=strdup(yytext);
return LIT_STRING;
}
\n {
// line break
printf("\n");
return WS_LINEBRK;
}
[ \t]+ /* ignore whitespace */;
%%
bison -r test.y will write a file test.output with a detailed description of the generated state machine that allows you to see what's going on - such as the state where the shift/reduce conflict occurs.
In your case, the problem is in the start state (corresponding to your start nonterminal, inputs). Say the first token is VAR_LOCAL. There's two things your parser could do:
It could match the variable case.
It could also match the inputs stmt WS_LINEBRK case: inputs matches the empty string (first line), and stmt matches variable "=" exps.
With the one token of lookahead that bison parsers use, there's no way to tell. You need to change your grammar to get rid of this case.
To fix the grammar, as Fabian has suggested, move the variable and literal to the end of exps from inputs
inputs:
| variable
| literal
exps:
...
| variable
| literal
That allows x= y,x="aliteral" syntax.
To allow for empty input lines, change the /* empty token */ rule to WS_LINEBREAK:
inputs: WS_LINEBRK
| stmt WS_LINEBRK
| inputs stmt WS_LINEBRK
;
On another note, since the scanner still looks for the SYMB_ EQUAL ; but the parser no longer defines it (its commented out), something needs to be done in order to compile. One option is to uncomment the %token definition and use SYMB_ EQUAL instead of the literal "=" in the parser .y file.