i'm trying to run this .y file
%{
#include <stdlib.h>
#include <stdio.h>
int yylex();
int yyerror();
%}
%start BEGIN
%%
BEGIN: 'a' | BEGIN 'a'
%%
int yylex(){
return getchar();
}
int yyerror(char* s){
fprintf(stderr, "*** ERROR: %s\n", s);
return 0;
}
int main(int argn, char **argv){
yyparse();
return 0;
}
It's a simple program in bison, the syntax seems to me correct, but always get the Syntax error problem ...
Thanks for your help.
The lexer function yylex needs to return 0 to indicate the end of the input. However, your implementation simply passes through the value returned by getchar, which will be EOF (normally -1).
Also, your input is almost certain to include a newline character, which will also be passed through to the parser.
Since the parser recognizes neither \n nor EOF, it produces an error when it receives one of them.
At a minimum, you would need to modify yylex to correctly respond to end of input:
int yylex(void) {
int ch = getchar();
return (ch == EOF) ? 0 : ch;
}
But you will still have to deal with newline charactets, either by handling them in your lexer (possibly ignoring them or possibly returning an end of input imdication), or by handling them in your grammar.
Note that bison/yacc-generated parsers always parse the entire input stream, not just the longest sequence satisfying the grammar. That can be adjusted with some work -- see the documentation for the YYACCEPT special action -- but the standard behaviour is usually what is desired when parsing.
By the way, please use standard style conventions in your bison/yacc grammars, in order to avoid problems and in order to avoid confusing readers. Normally we reserve UPPER_CASE for terminal symbols, since those are also used as compile-time constants in the lexer. Non-terminals are usually written in lower_case although some prefer to use CamelCase. For the terminals, you need to avoid the use of names reserved by the standard library (such as EOF) or by (f)lex (BEGIN) or bison/yacc (END). There are lists of reserved names in the manuals.
Related
Im having some trouble with my analizer. I.m trying to use yytext inside my yyerror but it shows me this error, can you help me?
You can't use yytext in your parser because it is defined by the lexer.
Indeed, you normally shouldn't use yytext in your parser because its value is not meaningful to the parse. Your attempt to use it to provide context in error messages is just about the only reasonable use, and even then there is a certain ambiguity because you can't tell whether the erroneous token is the one currently in yytext or the previous token, which was overwritten when the parser obtained its lookahead token.
In any case, if you want to refer to yytext inside your parser, you'll need to declare it, which will normally require putting
extern char* yytext;
into your bison grammar file. Since the only place you can reasonably use yytext is yyerror, you might change the definition of that function to:
void yyerror(const char* msg) {
extern char* yytext;
fprintf(stderr, "%s at line %d near '%s'\n", msg, nLineas, yytext);
}
Note that you can get flex to track line numbers automatically, so you don't need to track your own nLineas variable. Just add
%option yylineno
at the top of your flex file, and the global variable yylineno will automatically be maintained during lexical analysis. If you want to use yylineno in your parser, you'll need to add an an extern declaration for it as well:
extern int yylineno;
Again, using yylineno in the parser may be imprecise because it might refer to the line number of the token following the error, which might be on a different line from the error (and might even be separated from the error by many lines of comments).
As an alternative to using external declarations of yytext and yylineno, you are free to put the implementation of yyerror inside the scanner definition instead of the grammar definition. Your grammar file should already have a forward declaration of yyerror, so it doesn't matter which file it's placed in. If you put it into the scanner file, global scanner variables will already be declared.
I'm writing a basic parser that reads form stdin and prints results to stdout. The problem is that I'm having troubles with this grammar:
%token WORD NUM TERM
%%
stmt: /* empty */
| word word term { printf("[stmt]\n"); }
| word number term { printf("[stmt]\n"); }
| word term
| number term
;
word: WORD { printf("[word]\n"); }
;
number: NUM { printf("[number]\n"); }
;
term: TERM { printf("[term]\n"); /* \n */}
;
%%
When I run the program, I and type: hello world\n The output is (as I expected) [word] [word] [term] [stmt]. So far, so good, but then if I type: hello world\n (again), I get syntax error [word][term].
When I type hello world\n (for the third time) it works, then it fails again, then it works, and so on and do forth.
Am I missing something obvious in here?
(I have some experience on hand rolled compilers, but I've not used lex/yacc et. al.)
This is the main func:
int main() {
do {
yyparse();
} while(!feof(yyin));
return 0;
}
Any help would be appreciated. Thanks!
Your grammar recognises a single stmt. Yacc/bison expect the grammar to describe the entire input, so after the statement is recognised, the parser waits for an end-of-input indication. But it doesn't get one, since you typed a second statement. That causes the parser to report a syntax error. But note that it has now read the first token in the second line.
You are calling yyparse() in a loop and not stopping when you get a syntax error return value. So when you call yyparse() again, it will continue where the last one left off, which is just before the second token in the second line. What remains is just a single word, which it then correctly parses.
What you probably should do is write your parser so that it accepts any number of statements, and perhaps so that it does not die when it hits an error. That would look something like this:
%%
prog: %empty
| prog line
line: stmt '\n' { puts("Got a statement"); }
| error '\n' { yyerrok; /* Simple error recovery */ }
...
Note that I print a message for a statement only after I know that the line was correctly parsed. That usually turns out to be less confusing. But the best solution is not use printf's, but rather to use Bison's trace facility, which is as simple as putting -t on the bison command line and setting the global variable yydebug = 1;. See Tracing your parser
Here are my lex and yacc file to recognise palindrome strings but it is giving "INVALID "for both valid as well as invalid string. Please help me to find the problem, I am new to lex and yacc. Thanx in advance
LEX file
%{
#include "y.tab.h"
%}
%%
a return A;
b return B;
. return *yytext;
%%
YACC file
%{
#include<stdio.h>
#include "lex.yy.c"
int i=0;
%}
%token A B
%%
S: pal '\n' {i=1;}
pal:
| A pal A {printf("my3");i=1;}
| B pal B {printf("my4");i=1;}
| A {printf("my1");i=1;}
| B {printf("my2");i=1;}
;
%%
int main()
{
printf("Enter Valid string\n");
yyparse();
if(i==1)
printf("Valid");
return 0;
}
int yyerror(char* s)
{
printf("Invalid\n");
return 0;
}
Example : entered string is : aba
expected output should be VALID but it is giving INVALID
It is impossible to solve this problem with Yacc.
Yacc is a LALR(1) parser generator. LALR refers to a class of grammars. A grammar is a math tool to reason about parsing. One in parens refers to the lookahead - that is a max number of tokens we consider before definitely deciding which of the alternative productions (or "rules") to follow. Remember, the parsing algorithm is one pass, it can't backtrack and try another alternative as some regular expression engines do.
Concerning your palindrom problem, when a parser encounters 'a', it has to pick the right choice somehow
pal: A - 'a' alone is a valid palindrome all by itself, let's call it the inner core
pal: [A] pal A - outter layer, increasing nesting level
pal: A pal [A] - outter layer, decreasing nesting level
Making the right choice is impossible without infinite lookahead, but Yacc has only one token of lookahead.
The way Yacc handles this grammar is interesting as well.
If a grammar is ambiguous or not LR(1) the generated stack automata is non-deterministic. There are some builtin tools to fix it.
The first tool is priorities and associativity to deal with operators in programming languages (not relevant here).
Another one is a quirk - by default Yacc prefers "shift" to "reduce". These two are technicalities reffering to the internal operation of the parse algorithm. Basically tokens are "shift" into a stack. Once a group of tokens on the top match a rule it is possible to "reduce" them, replacing entire group with the single non-terminal from the left side of the rule.
Hence once we have 'a' at the top, we can either reduce it to a pal, or we can shift another token in assuming that a nested pal will emerge eventually. Yacc prefers the later.
The reason for this preference? The same ambiguity arrise in if-then-else statement in most languages. Consider two nested if statements but only one else clause. Yacc attaches else to the innermost if statement which seams to be the right thing to do.
Besides Yacc can generate a report highlighting issues in the grammar like shift-reduce conflicts mentioned above.
In the continuation of #ChrisDod and #NickZavaritsky comments, I add a working version of the glr (bison) parser.
%option noyywrap
%%
a return A;
b return B;
\n return '\n';
. {fprintf(stderr, "Error\n"); exit(1);}
%%
and Yacc / bison
%{
#include <stdio.h>
int i=0;
%}
%token A B
%glr-parser
%%
S : pal '\n' {i=1; return 1 ;}
| error '\n' {i=0; return 1 ;}
pal: A pal A
| B pal B
| A
| B
|
;
%%
#include "lex.yy.c"
int main() {
yyparse();
if(i==1) printf("Valid\n");
else printf("inValid\n");
return 0;
}
int yyerror(char* s) { return 0; }
Some changes were introduced in the lexer: (1) \n was missing; (2) unknown chars are now fatal errors;
The error recovery error was used to obtain the "invalid palindrome" situations.
here is my lex yacc code to parse an XML file and print the contents between and tags.
LEX
%{
%}
%%
"<XML>" {return XMLSTART;}
"</XML>" {return XMLEND;}
[a-z]+ {yylval=strdup(yytext); return TEXT;}
"<" {yylval=strdup(yytext);return yytext[0];}
">" {yylval=strdup(yytext);return yytext[0];}
"\n" {yylval=strdup(yytext);return yytext[0];}
. {}
%%
YACC
%{
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define YYSTYPE char *
%}
%token XMLSTART
%token XMLEND
%token TEXT
%%
program : XMLSTART '\n' A '\n' XMLEND {printf("%s",$3);
}
A : '<' TEXT '>' '\n' A '\n' '<' TEXT '>' { $$ = strcat($1,strcat($2,strcat($3,strcat($4,strcat($5,strcat($6,strcat($7,strcat($8,$9))))))));}
| TEXT
%%
#include"lex.yy.c"
I'm getting Syntax error, tried using ECHOs at some places but didn't find the error.
The input file I'm using is:
<XML>
<hello>
hi
<hello>
</XML>
Please help me figure out the error. I have relatively less experience using lex and yacc
That grammar will only successfully parse a file which has XMLEND at the end. However, all text files end with a newline.
Although you could presumably fix that by adding a newline at the end of the start rule, it's almost always a bad idea to try to parse whitespace. In general, except for line-oriented languages -- which xml is not -- it is best to ignore whitespace.
Your use of strcat is incorrect. Quoting man strcat from a GNU/Linux system:
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.
You might want to use asprintf if it exists in your standard library.
Also, you never free() the strings produced by strdup, so all of them leak memory. In general, it's better to not set strdup tokens whose string representation is known -- particularly single-character tokens -- but the important thing is to keep track of the tokens whose string value has been freshly allocated. That would apply to semantic values produced with asprintf if the above suggestion is taken.
My apologies if the title of this thread is a little confusing. What I'm asking about is how does Flex (the lexical analyzer) handle issues of precedence?
For example, let's say I have two tokens with similar regular expressions, written in the following order:
"//"[!\/]{1} return FIRST;
"//"[!\/]{1}\< return SECOND;
Given the input "//!<", will FIRST or SECOND be returned? Or both? The FIRST string would be reached before the SECOND string, but it seems that returning SECOND would be the right behavior.
The longest match is returned.
From flex & bison, Text Processing Tools:
How Flex Handles Ambiguous Patterns
Most flex programs are quite ambiguous, with multiple patterns that can match
the same input. Flex resolves the ambiguity with two simple rules:
Match the longest possible string every time the scanner matches input.
In the case of a tie, use the pattern that appears first in the program.
You can test this yourself, of course:
file: demo.l
%%
"//"[!/] {printf("FIRST");}
"//"[!/]< {printf("SECOND");}
%%
int main(int argc, char **argv)
{
while(yylex() != 0);
return 0;
}
Note that / and < don't need escaping, and {1} is redundant.
bart#hades:~/Programming/GNU-Flex-Bison/demo$ flex demo.l
bart#hades:~/Programming/GNU-Flex-Bison/demo$ cc lex.yy.c -lfl
bart#hades:~/Programming/GNU-Flex-Bison/demo$ ./a.out < in.txt
SECOND
where in.txt contains //!<.