How to fetch the row and column number of error - yacc

How to fetch the row and column number of error (i.e which part of string does not follow the grammar rules)?
I am using yacc parser to check the grammar.
Thank you.

you'd better read the dragon book and the aho book that explain and show example of how to write a lex/yacc based compiler.
In order to get line/column of the error, you shall make your lexer preserve the column and line. So in your lexer, you have to declare two globals, SourceLine and SourceCol (of course you can use better non-camel cased names).
In each token production, you have to calculate the column of the produced token, for that purpose I use a macro as follows:
#define Return(a, b, c) \
{\
SourceCol = (SourceCol + yyleng) * c; \
DPRINT ("## Source line: %d, returned token: "a".\n", SourceLine); \
return b; \
}
and the token production, with that macro, is:
"for" { Return("FOR", FOR, 1);
then to keep lines, for each token that makes a new line, I'm using:
{NEWLINES} {
BEGIN(INITIAL);
SourceLine += yyleng;
Return("LINE", LINE, 0);
}
Then in your parser, you can get SourceCol and SourceLine if you declare those as extern globals:
extern unsigned int SourceCol;
extern unsigned int SourceLine;
and now in your parse_error grammar production, you can do:
parse_error : LEXERROR
{
printf("OMG! Your code sucks at line %u and col %u!", SourceLine, SourceCol);
}
of course you may want to add yytext, handle a more verbose error message etc.. But all that's up to you!

Related

Yacc/bison: what's wrong with my syntax equations?

I'm writing a "compiler" of sorts: it reads a description of a game (with rooms, characters, things, etc.) Think of it as a visual version of an Adventure-style game, but with much simpler problems.
When I run my "compiler" I'm getting a syntax error on my input, and I can't figure out why. Here's the relevant section of my yacc input:
character
: char-head general-text character-insides { PopChoices(); }
;
character-insides
: LEFTBRACKET options RIGHTBRACKET
;
char-head
: char-namesWT opt-imgsWT char-desc opt-cond
;
char-desc
: general-text { SetText($1); }
;
char-namesWT
: DOTC ID WORD { AddCharacter($3, $2); expect(EXP_TEXT); }
;
opt-cond
: %empty
| condition
;
condition
: condition-reason condition-main general-text
{ AddCondition($1, $2, $3); }
;
condition-reason
: DOTU { $$ = 'u'; }
| DOTV { $$ = 'v'; }
;
condition-main
: money-conditionWT
| have-conditionWT
| moves-conditionWT
| flag-conditionWT
;
have-conditionWT
: PERCENT_SLASH opt-bang ID
{ $$ = MkCondID($1, $2, $3) ; expect(EXP_TEXT); }
;
opt-bang
: %empty { $$ = TRUE; }
| BANG { $$ = FALSE; }
;
ID: WORD
;
Things in all caps are terminal symbols, things in lower or mixed case are non-terminals. If a non-terminal ends in WT, then it "wants text". That is, it expects that what comes after it may be arbitrary text.
Background: I have written my own token recognizer in C++ because(*) I want the syntax to be able to change the way the lexer's behavior. Two types of tokens should be matched only when the syntax expects them: FILENAME (with slashes and other non-alphameric characters) and TEXT, which means "all the text from here to the end of the line" (but not starting with certain keywords).
The function "expect" tells the lexer when to look for these two symbols. The expectation is reset to EXP_NORMAL after each token is returned.
I have added code to yylex that prints out the tokens as it recognizes them, and it looks to me like the tokenizer is working properly -- returning the tokens I expect.
(*) Also because I want to be able to ask the tokenizer for the column where the error occurred, and get the contents of the line being scanned at the time so I can print out a more useful error message.
Here is the relevant part of the input:
.c Wendy wendy
OK, now you caught me, what do you want to do with me?
.u %/lasso You won't catch me like that.
[
Here is the last part of the debugging output from yylex:
token: 262: DOTC/
token: 289: WORD/Wendy
token: 289: WORD/wendy
token: 292: TEXT/OK, now you caught me, what do you want to do with me?
token: 286: DOTU/
token: 274: PERCENT_SLASH/%/
token: 289: WORD/lasso
token: 292: TEXT/You won't catch me like that.
token: 269: LEFTBRACKET/
here's my error message:
: line 124, columns 3-4: syntax error, unexpected LEFTBRACKET, expecting TEXT
[
To help you understand the equations above, here is the relevant part of the description of the input syntax that I wrote the yacc code from.
// Character:
// .c id charactername,[imagename,[animationname]]
// description-text
// .u condition on the character being usable [optional]
// .v condition on the character being visible [optional]
// [
// (options)
// ]
// Conditions:
// %$[-]n Must [not] have at least n dollars
// %/[-]name Must [not] have named thing
// %t-nnn At/before specified number of moves
// %t+nnn At/after specified number of moves
// %#[-]name named flag must [not] be set
// Condition-char: $, /, t, or #, as described above
//
// Condition:
// % condition-char (identifier/int) ['/' text-if-fail ]
// description-text: Can be either on-line text or multi-line text
// On-line text is the rest of the line
brackets mark optional non-terminals, but a bracket standing alone (represented by LEFTBRACKET and RIGHTBRACKET in the yacc) is an actual token, e.g.
// [
// (options)
// ]
above.
What am I doing wrong?
To debug parsing problems in your grammar, you need to understand the shift/reduce machine that yacc/bison produces (described in the .output file produced with the -v option), and you need to look at the trail of states that the parser goes through to reach the problem you see.
To enable debugging code in the parser (which can print the states and the shift and reduce actions as they occur), you need to compile with -DYYDEBUG or put #define YYDEBUG 1 in the top of your grammar file. The debugging code is controlled by the global variable yydebug -- set to non-zero to turn on the trace and zero to turn it off. I often use the following in main:
#ifdef YYDEBUG
extern int yydebug;
if (char *p = getenv("YYDEBUG"))
yydebug = atoi(p);
#endif
Then you can include -DYYDEBUG in your compiler flags for debug builds and turn on the debugging code by something like setenv YYDEBUG 1 to set the envvar prior to running your program.
I suppose your syntax error message was generated by bison. What is striking is that it claims to have found a LEFTBRACKET when it expects a [. Naively, you might expect it to be satisfied with the LEFTBRACKET it found, but of course bison knows nothing about LEFTBRACKET except its numeric value, which will be some integer larger than 256.
The only reason bison might expect [ is if your grammar includes the terminal '['. But since your scanner seems to return LEFTBRACKET when it sees a [, the parser will never see '['.

Flex/Lex - How to know if a variable was declared

My grammar allows:
C → id := E // assign a value/expression to a variable (VAR)
C → print(id) // print variables(VAR) values
To get it done, my lex file is:
[a-z]{
yylval.var_index=get_var_index(yytext);
return VAR;
}
get_var_index returns the index of the variable in the list, if it does not exist then it creates one.
It is working!
The problem is:
Everytime a variable is matched on lex file it creates a index to that variable.
I have to report if 'print(a)' is called and 'a' was not declared, and that will never happen since print(a) always creates an index to 'a'.*
How can I solve it?
Piece of yacc file:
%union {
int int_val;
int var_index;
}
%token <int_val> INTEGER
%token <var_index> VAR
...
| PRINT '(' VAR ')'{
n_lines++;
printf("%d\n",values[$3]);
}
...
| VAR {$$ =values[$1];}
This does seem a bit like a Computer Science class homework question for us to do.
Normally one would not use bison/yacc in this way. One would do the parse with bison/yacc and make a parse tree which then gets walked to perform semantic checks, such as checking for declaration before use and so on. The identifiers would normally be managed in a symbol table, rather than just a table of values to enable other attributes, such as declared to be managed. It's for these reasons that it looks like an exercise rather than a realistic application of the tools. OK; those disclaimers disposed of, lets get to an answer.
The problem would be solved by remembering what has been declared and what not. If one does not plan to use a full symbol table then a simple array of booleans indicating which are the valid values could be used. The array can be initialised to false and set to true on declaration. This value can be checked when a variable is used. As C uses ints for boolean we can use that. The only changes needed are in the bison/yacc. You omitted any syntax for the declarations, but as you indicated they are declared there must be some. I guessed.
%union {
int int_val;
int var_index;
}
int [MAX_TABLE_SIZE] declared; /* initialize to zero before starting parse */
%token <int_val> INTEGER
%token <var_index> VAR
...
| DECLARE '(' VAR ')' { n_lines++; declared[$3] = 1; }
...
| PRINT '(' VAR ')'{
n_lines++;
if (declared[$3]) printf("%d\n",values[$3]);
else printf("Variable undeclared\n");
}
...
| VAR {$$ =value[$1]; /* perhaps need to show more syntax to show how VAR used */}

failed to parse number by yacc and lex

i have finished my lex file and start to learn about yacc
but i have some question about part of my code of lex:
%{
#include "y.tab.h"
int num_lines = 1;
int comment_mode=0;
int stack =0;
%}
digit ([0-9])
integer ({digit}+)
float_num ({digit}+\.{digit}+)
%%
{integer} { //deal with integer
printf("#%d: NUM:",num_lines); ECHO;printf("\n");
yylval.Integer = atoi(yytext);
return INT;
}
{float_num} {// deal with float
printf("#%d: NUM:",num_lines);ECHO;printf("\n");
yylval.Float = atof(yytext);
return FLOAT;
}
\n { ++num_lines; }
. if(strcmp(yytext," "))ECHO;
%%
int yywrap() {
return 1;
}
every time i got an integer or a float i return the token and save it into yylval
and here is my code in parser.y:
%{
#include <stdio.h>
#define YYDEBUG 1
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%}
%union{
int Integer;
float Float;
}
%token <int>INT;
%token <float>FLOAT;
%%
statement :
INT {printf("int yacc\n");}
| FLOAT {printf("float yacc\n");}
|
;
%%
int main(int argc, char** argv)
{
yyparse();
return 0;
}
which compiled by
byacc –d parser.y
lex lex.l
gcc lex.yy.c y.tab.c –ll
since i just want to try something easy to get started, i want to see if i can parse
only int and float number first, i print them in both .l and .y file after i input an
integer or a float.int the begining i input fisrt random number, for example 123
, then my program print :
1: NUM: 123
in yylex() and
"int yacc\n"
in parser.y
but if i input the second else number, it shows syntax error and the program shutdown
i dont know where is the problem.
is there any solution?
Your grammar only accepts a single token, either an INT or a FLOAT. So it will only accept a single number, which is why it produces a syntax error when it reads the second number; it is expecting an end-of-file.
The solution is to change the grammar so that it accepts any number of "statements":
program: /* EMPTY */
| program statement
;
Two notes:
1) You don't need an (expensive) strcmp in your lexer. Just do this:
" " /* Do nothing */;
. { return yytext[0]; }
It's better to return the unknown character to the parser, which will produce a syntax error if the character doesn't correspond to any token type (as in your simple grammar) than to just echo the character to stdout, which will prove confusing. Some people would prefer to produce an error message in the lexer for invalid input, but while you are developing a grammar I think it is easier to just pass through the characters, because that lets you add operators to your parser without regenerating the lexer.
2) When you specify %types in bison, you use the tagname from the union, not the C type. Some (but not all) versions of bison let you get away with using the C type if it is a simple type, but you can't count on it; it's not posix standard and it may well break if you use an older or newer version of bison. (For example, it won't work with bison 3.0.) So you should write, for example:
%union{
int Integer;
float Float;
}
%token <Integer>INT;
%token <Float>FLOAT;

printf(), fprintf(), wprintf() and NSlog() won't print on XCode

I'm doing a small app for evaluating and analyzing transfer functions. As boring as the subject might seem to some, I want it to at least look extra cool and pro and awesome etc... So:
Step 1: Gimme teh coefficients! [A bunch of numbers]
Step 2: I'll write the polynomial with its superscripts. [The bunch of numbers in a string]
So, I write a little C parser to just print the polynomial with a decent format, for that I require a wchar_t string that I concatenate on the fly. After the string is complete I quickly try printing it on the console to check everything is ok and keep going. Easy right? Welp, I ain't that lucky...
wchar_t *polynomial_description( double *polyArray, char size, char var ){
wchar_t *descriptionString, temp[100];
int len, counter = 0;
SUPERSCRIPT superscript;
descriptionString = (wchar_t *) malloc(sizeof(wchar_t) * 2);
descriptionString[0] = '\0';
while( counter < size ){
superscript = polynomial_utilities_superscript( size - counter );
len = swprintf(temp, 100, L"%2.2f%c%c +", polyArray[counter], var, superscript);
printf("temp size: %d\n", len);
descriptionString = (wchar_t *) realloc(descriptionString, sizeof(wchar_t) * (wcslen(descriptionString) + len + 1) );
wcscat(descriptionString, temp);
counter++;
}
//fflush(stdout); //Already tried this
len = wprintf(L"%ls\n", descriptionString);
len = printf("%ls**\n", descriptionString);
len = fprintf(stdout, "%ls*\n", descriptionString);
len = printf("FFS!! Print something!");
return descriptionString;
}
During the run we can see temp size: 8 printed the expected number of times ONLY WHILE DEBUGGING, if I run the program I get an arbitrary number of prints each run. But after that, as the title states, wprintf, printf and fprintf don't print anything, yet len does change its size after each call.
In the caller function, (application:(UIApplication *)application didFinishLaunchingWithOptions:, while testing) I put an NSLog to print the return string, and I dont get ANYTHING not even the Log part.
What's happening? I'm at a complete loss.
Im on XCode 4.2 by the way.
What's the return value from printf/wprintf in the case where you think it's not printing anything? It should be returning either -1 in the case of a failure or 1 or more, since if successful, it should always print at least the newline character after the description string.
If it's returning 1 or more, is the newline getting printed? Have you tried piping the output of your program to a hex dumper such as hexdump -C or xxd(1)?
If it's returning -1, what is the value of errno?
If it turns out that printf is failing with the error EILSEQ, then what's quite likely happening is that your string contains some non-ASCII characters in it, since those cause wcstombs(3) to fail in the default C locale. In that case, the solution is to use setlocale(3) to switch into a UTF-8 locale when your program starts up:
int main(int argc, char **argv)
{
// Run "locale -a" in the Terminal to get a list of all valid locales
setlocale(LC_ALL, "en_US.UTF-8");
...
}

loop on prompt with a yes or no?

Good afternoon,
I'm trying to accomplish a task that i know should be doable. however my attempts seem to fail every time. My endeavor is to learn to code in Objective -c and have been making good progress. what i would like to do is add a loop to my current application that asks at the end if i would like to run again or some thing to that regard, and reply with a yes or no. if no the program ends and if yes it jumps back to the top of the project to start all over. kinda like what i have below? forgive me please if its not quite perfect, im still getting used to programing and am finding it incredibly fun.
#include <stdio.h>
int main(void)
{
char loop = yes;
while (loop = yes)
{
.
.
.
}
printf ("would you like to continue (yes/no)/n");
scanf ("%s", loop);
}
The printf and scanf need to be moved up inside the curly braces of the while loop. Also, you want \n instead of /n in the printf. Finally, you're going to get a string back with that scanf() call, so you'll want to declare loop as a char array, and then in the while loop, check the first element of that array for a 'y' or 'n' or something like that. You might also want to look at getchar() instead of scanf() for that sort of thing.
Not compiled here, but should work:
#include <stdio.h>
int main(void)
{
char buffer[256];
do {
.
.
.
printf ("would you like to continue (yes/no)/n");
scanf ("%s", buffer);
} while (strcmp(buffer,"yes") != 0);
}
One wouldn't do anything like that in a real world application, but for demonstration purpose it should be ok.
I made your variable an array, because strings are arrays of characters in C. Length is set to 256 bytes (255 characters + 0-byte as delimiter). I changed the loop to do-while to make it run at least once. For string comparison you need to call a function. strcmp returns 0 for identical strings. Finally, the question belongs in the loop.
It is plain C though, using nothing of Objective-C.
int main() {
char A = 'n';
char B = 'y';
char Answer;
printf("Does the subject have a glazed over look? (y/n): \n");
scanf("%c",&Answer);
if (Answer=='N'||Answer=='y'|| Answer=='N'||Answer=='Y')
printf("Good\n");
else
printf("Please enter 'y' or 'n' \n ");
return 0;
}
#include <stdio.h>
int main(void)
{
avi;
char loop[10];
while (loop = yes)
{
.
.
.
}
printf ("would you like to continue (yes/no)/n");
scanf ("%s", loop);
if(strcpm(loop,"YES")==0) goto avi:
}