Lex and Yacc do not report an error when an unexpected character is parsed. - yacc

Lex and Yacc are not reporting an error when an unexpected character is parsed. In the code below, there is no error when #set label sample is parsed, but the # is not valid.
Lex portion of code
identifier [\._a-zA-Z0-9\/]+
<INITIAL>{s}{e}{t} {
return SET;
}
<INITIAL>{l}{a}{b}{e}{l} {
return LABEL;
}
<INITIAL>{i}{d}{e}{n}{t}{i}{f}{i}{e}{r} {
strncpy(yylval.str, yytext,1023);
yylval.str[1023] = '\0';
return IDENTIFIER;
}
Yacc portion of code.
definition : SET LABEL IDENTIFIER
{
cout<<"set label "<<$3<<endl;
};
When #set sample label is parsed, there should be an error reported because # is an unexpected character. But there is no error reported. How should I modify the code so an error is reported?

(Comments converted to a SO style Q&A format)
#JonathanLeffler wrote:
That's why you need a default rule in the lexical analyzer (typically the LHS is .) that arranges for an error to be reported. Without it, the default action is just to echo the unmatched character and proceed onwards with the next one.
At the least you would want to include the specific character that is causing trouble in the error message. You might well want to return it as a single-character token, which will generally trigger an error in the grammar. So:
<*>. { cout << "Error: unexpected character " << yytext << endl; return *yytext; }
might be appropriate.

Related

Making cin take selective inputs [Turbo C++]

Don't hate cause of Turbo, I already hate my school!
I wish to show an error msg if a character is entered instead of an int or float in some file such as age or percentage.
I wrote this function:
template <class Type>
Type modcin(Type var) {
take_input: //Label
int count = 0;
cin>>var;
if(!cin){
cin.clear();
cin.ignore();
for ( ; count < 1; count++) { //Printed only once
cout<<"\n Invalid input! Try again: ";
}
goto take_input;
}
return var;
}
but the output is not desirable:
How do I stop the error msg from being repeated multiple times?
Is there a better method?
NOTE: Please make sure that this is TurboC++ that we are talking about, I tried using the approach in this question, but even after including limits.h, it doesn't work.
Here, a code snippet in C++.
template <class Type>
Type modcin(Type var) {
int i=0;
do{
cin>>var;
int count = 0;
if(!cin) {
cin.clear();
cin.ignore(numeric_limits<streamsize>::max(), '\n');
for ( ; count < 1; count++) { //Printed only once
cout<<"\n Invalid input! Try again: ";
cin>>var;
}
}
} while (!cin);
return var;
}
The variables are tailored to match yours' so you can understand better. This code isn't perfect though.
It can't handle cases like "1fff", here you would just get a 1 in return. I tried solving it but then a infinite loop was being encountered, when I'll fix it, I'll update the code.
It also can't function in TurboC++ effectively. I don't know if there are alternatives but the numeric_limits<streamsize>::max() argument gives a compiler error ('undefined symbol' error for numeric_limits & streamsize and 'prototype must be defined' error for max()) in Turbo C++.
So, to make it work in Turbo C++. Replace the numeric_limits<streamsize>::max() argument with some big int value such as 100.
This will make it so that the buffer is only ignored/cleared till 100 characters are reached or '\n' (enter button/newline character) is pressed.
EDIT
The following code can be executed on both Turbo C++ or proper C++. The comments are provided to explain the functioning:
template <class Type> //Data Integrity Maintenance Function
Type modcin(Type var) { //for data types: int, float, double
cin >> var;
if (cin) { //Extracted an int, but it is unknown if more input exists
//---- The following code covers cases: 12sfds** -----//
char c;
if (cin.get(c)) { // Or: cin >> c, depending on how you want to handle whitespace.
cin.putback(c); //More input exists.
if (c != '\n') { // Doesn't work if you use cin >> c above.
cout << "\nType Error!\t Try Again: ";
cin.clear(); //Clears the error state of cin stream
cin.ignore(100, '\n'); //NOTE: Buffer Flushed <|>
var = modcin(var); //Recursive Repeatation
}
}
}
else { //In case, some unexpected operation occurs [Covers cases: abc**]
cout << "\nType Error!\t Try Again: ";
cin.clear(); //Clears the error state of cin stream
cin.ignore(100, '\n'); //NOTE: Buffer Flushed <|>
var = modcin(var);
}
return var;
//NOTE: The '**' represent any values from ASCII. Decimal, characters, numbers, etc.
}

Yacc/bison: what's wrong with my syntax equations?

I'm writing a "compiler" of sorts: it reads a description of a game (with rooms, characters, things, etc.) Think of it as a visual version of an Adventure-style game, but with much simpler problems.
When I run my "compiler" I'm getting a syntax error on my input, and I can't figure out why. Here's the relevant section of my yacc input:
character
: char-head general-text character-insides { PopChoices(); }
;
character-insides
: LEFTBRACKET options RIGHTBRACKET
;
char-head
: char-namesWT opt-imgsWT char-desc opt-cond
;
char-desc
: general-text { SetText($1); }
;
char-namesWT
: DOTC ID WORD { AddCharacter($3, $2); expect(EXP_TEXT); }
;
opt-cond
: %empty
| condition
;
condition
: condition-reason condition-main general-text
{ AddCondition($1, $2, $3); }
;
condition-reason
: DOTU { $$ = 'u'; }
| DOTV { $$ = 'v'; }
;
condition-main
: money-conditionWT
| have-conditionWT
| moves-conditionWT
| flag-conditionWT
;
have-conditionWT
: PERCENT_SLASH opt-bang ID
{ $$ = MkCondID($1, $2, $3) ; expect(EXP_TEXT); }
;
opt-bang
: %empty { $$ = TRUE; }
| BANG { $$ = FALSE; }
;
ID: WORD
;
Things in all caps are terminal symbols, things in lower or mixed case are non-terminals. If a non-terminal ends in WT, then it "wants text". That is, it expects that what comes after it may be arbitrary text.
Background: I have written my own token recognizer in C++ because(*) I want the syntax to be able to change the way the lexer's behavior. Two types of tokens should be matched only when the syntax expects them: FILENAME (with slashes and other non-alphameric characters) and TEXT, which means "all the text from here to the end of the line" (but not starting with certain keywords).
The function "expect" tells the lexer when to look for these two symbols. The expectation is reset to EXP_NORMAL after each token is returned.
I have added code to yylex that prints out the tokens as it recognizes them, and it looks to me like the tokenizer is working properly -- returning the tokens I expect.
(*) Also because I want to be able to ask the tokenizer for the column where the error occurred, and get the contents of the line being scanned at the time so I can print out a more useful error message.
Here is the relevant part of the input:
.c Wendy wendy
OK, now you caught me, what do you want to do with me?
.u %/lasso You won't catch me like that.
[
Here is the last part of the debugging output from yylex:
token: 262: DOTC/
token: 289: WORD/Wendy
token: 289: WORD/wendy
token: 292: TEXT/OK, now you caught me, what do you want to do with me?
token: 286: DOTU/
token: 274: PERCENT_SLASH/%/
token: 289: WORD/lasso
token: 292: TEXT/You won't catch me like that.
token: 269: LEFTBRACKET/
here's my error message:
: line 124, columns 3-4: syntax error, unexpected LEFTBRACKET, expecting TEXT
[
To help you understand the equations above, here is the relevant part of the description of the input syntax that I wrote the yacc code from.
// Character:
// .c id charactername,[imagename,[animationname]]
// description-text
// .u condition on the character being usable [optional]
// .v condition on the character being visible [optional]
// [
// (options)
// ]
// Conditions:
// %$[-]n Must [not] have at least n dollars
// %/[-]name Must [not] have named thing
// %t-nnn At/before specified number of moves
// %t+nnn At/after specified number of moves
// %#[-]name named flag must [not] be set
// Condition-char: $, /, t, or #, as described above
//
// Condition:
// % condition-char (identifier/int) ['/' text-if-fail ]
// description-text: Can be either on-line text or multi-line text
// On-line text is the rest of the line
brackets mark optional non-terminals, but a bracket standing alone (represented by LEFTBRACKET and RIGHTBRACKET in the yacc) is an actual token, e.g.
// [
// (options)
// ]
above.
What am I doing wrong?
To debug parsing problems in your grammar, you need to understand the shift/reduce machine that yacc/bison produces (described in the .output file produced with the -v option), and you need to look at the trail of states that the parser goes through to reach the problem you see.
To enable debugging code in the parser (which can print the states and the shift and reduce actions as they occur), you need to compile with -DYYDEBUG or put #define YYDEBUG 1 in the top of your grammar file. The debugging code is controlled by the global variable yydebug -- set to non-zero to turn on the trace and zero to turn it off. I often use the following in main:
#ifdef YYDEBUG
extern int yydebug;
if (char *p = getenv("YYDEBUG"))
yydebug = atoi(p);
#endif
Then you can include -DYYDEBUG in your compiler flags for debug builds and turn on the debugging code by something like setenv YYDEBUG 1 to set the envvar prior to running your program.
I suppose your syntax error message was generated by bison. What is striking is that it claims to have found a LEFTBRACKET when it expects a [. Naively, you might expect it to be satisfied with the LEFTBRACKET it found, but of course bison knows nothing about LEFTBRACKET except its numeric value, which will be some integer larger than 256.
The only reason bison might expect [ is if your grammar includes the terminal '['. But since your scanner seems to return LEFTBRACKET when it sees a [, the parser will never see '['.

YACC: Can I generate "syntax error" from my "semantic" processing?

I'm writing a program in YACC and C/C++. It parses a fairly simple grammar and stores the results in some tables.
I have rules like
room: DOTR ID roomname { AddRoom($3, $2); };
and the code for AddRoom is:
void AddRoom(const char* name, const char* id)
{
theRoom = (void)new GameRoom(name, id);
if (!theGame->addRoom(theRoom)) {
?????
}
}
???? would be where I would insert code to generate a syntax error (I hope).
The purpose of this code is that every object in the game (rooms, doors, NPCs, things) has a unique ID. If theGame->addRoom detects that the ID is not unique, it will return false, and I want yacc to display an error message at that point in the input -- just as if an illegal token had been there.
Just call yyerror(), and remember that there was an error so you don't proceed to later stages. But you do not want to treat this as a syntax error: otherwise you will cause the parser to start discarding tokens etc.

How to get the original text using $text at the same time rule returns value

So I have the following grammar:
top_cmd :cmds
{
std::cout << $cmds.text << std::endl;
}
;
cmds returns [char* str]
: cmd+
{
str = new char('a');
}
;
I get g++ compile error:
"str" was not declared in this scope
If I remove this line
std::cout << $cmds.text << std::endl;
Then compile is fine.
I googled how "$text" is used, it seems to me it is expected to use $text for the purpose of rewrite rules. In my example, the function "cmds" returns "char*" when I remove the offending line and some complex structure when I keep it.
I can think of the following workaround:
1. do not have lower level rules return anything, but pass variable into lower level rules.
2. use re-write rules
But both are pretty big change(my real project is fairly large) considering how much time budge I have.
So is there a short-cut? Basically I do not want to change the grammar of top_cmd and cmds, but I can get the full text of cmds' matching.
I am using ANTLR3C, but I believe this is independent of target language.
I think you should do something like this:
top_cmd :cmds
{
std::cout << $cmds.text << std::endl;}
;
cmds returns [char* str]
: cmd+
{
$str = new char('a'); //added $
}
;
$cmds.text will return the text matched for the rule cmds. If you want to return the str, you should change it for $cmds.str
$text: The text matched for a rule or the text matched
from the start of the rule up until the point of
the $text expression evaluation. Note that this
includes the text for all tokens including those
on hidden channels, which is what you want
because usually that has all the whitespace and
comments. When referring to the current rule,
this attribute is available in any action including
any exception actions.
from the Definitive Antrl reference

antlr displayRecognitionError

Using antlr 3 with java
I'm having trouble printing the text from the token here:
#lexer::members {
public void displayRecognitionError(String[] tokenNames, RecognitionException e) {
System.err.println("Encountered an illegal char " + getText() + " on line "+getLine()+":"+getCharPositionInLine ());
}
}
I'm making a more detailed error report for the lexer grammar.
The thing is that when the error occurs (the user enters a token like : which isn't defined) it only shows "Encountered an illegal char on line x:y", instead it should show the invalid character : between char and on line x:y.
What can I do to show the invalid character, line and column?