Bison - recover from If Else error - syntax-error

I'm trying to recover from an error in an If-Else statement.
In my grammar an If is always followed by an Else.
statement: OBRACES statements CBRACES
| IF OPAR exp CPAR statement ELSE statement
| IF OPAR exp CPAR statement error '\n' { yyerrok; yyclearin;}
;
The error found is in the commented else in the last lines:
public boolean Equal(Element other){
if (!this.Compare(aux01,Age))
ret_val = false ;
//else
//nt = 0 ;
}
error: syntax error, unexpected CBRACES, expecting ELSE -> } # line 29
It is not recovering from that error ignoring the errors that come after.
Maybe i'm not understanding well how this error works but i can only find 2 examples on every site about error recovery: "error '\n'" and "'(' error ')'"
Anyone have an idea how to recover from this error (when an if is not followed by an else).
Thanks

You've provided not enough context to know exactly, but my guess is that the lexer/tokenizer, which feeds tokens to your parser, skips white space - including '\n'. So the parser never sees the newline and b/c of that never reduces the
IF OPAR exp CPAR statement error '\n'
production and so its action
{ yyerrok; yyclearin;}
never gets executed and so the error is not recovered.

Probably (although it is hard to say for sure without seeing more of the grammar) you don't need to skip any tokens in the case that else is not found.
The most likely case is that the program is simply lacking an else clause (perhaps because its author is used to other programming languages in which else is optional), and parsing can simply continue as though there were an empty else clause. So you should be able to just use:
IF OPAR exp CPAR statement error { yyerrok; }
(Note: I removed yyclearin because you almost certainly don't want to do that. In the case of the error in the OP, the result would be to ignore the '}' token, leading to extraneous errors later in the parse.)
You probably should take advantage of the action in this error production to produce a clear error message ("if statements must have else clauses"), although the default message is reasonably clear as well.
It is certainly the case that whatever token(s) are used as error context must be produceable by the scanner. That generally precludes error recovery techniques such as "skip to the end of the line", except in the case of languages in which newlines are syntactically significant.

I found out the problem. Thanks for the help anyway.
I put the error inside the braces and now is working.
statement: OBRACES statements CBRACES
| OBRACES error CBRACES { yyerrok; yyclearin;}
| IF OPAR exp CPAR statement ELSE statement
;

Related

Alternation with EOF producing odd output

The following works fine:
testRoot
: sqlStatement EOF ?
;
However, if I add in the following:
testRoot
: sqlStatement (SEMI | EOF) ?
;
The same input now gives me the following error message:
line 1:8 extraneous input '<EOF>' expecting {<EOF>, SEMI}
Is there something that I'm missing here, or why is that error popping up?
EOF really isn't something that's optional. A token stream will always have an EOF token at the end of the stream.
testRoot: sqlStatement SEMI? EOF;
You always want an EOF at the end of a start rule. Without it, ANTLR may be able to recognize a portion of your input, and will discard the remainder without an error. With it, the rule says... this rule has to end with the EOF token, so it will consume all of the input to get to the EOF and will report any syntax errors it encounters.

Erratic parser. Same grammar, same input, cycles through different results. What am I missing?

I'm writing a basic parser that reads form stdin and prints results to stdout. The problem is that I'm having troubles with this grammar:
%token WORD NUM TERM
%%
stmt: /* empty */
| word word term { printf("[stmt]\n"); }
| word number term { printf("[stmt]\n"); }
| word term
| number term
;
word: WORD { printf("[word]\n"); }
;
number: NUM { printf("[number]\n"); }
;
term: TERM { printf("[term]\n"); /* \n */}
;
%%
When I run the program, I and type: hello world\n The output is (as I expected) [word] [word] [term] [stmt]. So far, so good, but then if I type: hello world\n (again), I get syntax error [word][term].
When I type hello world\n (for the third time) it works, then it fails again, then it works, and so on and do forth.
Am I missing something obvious in here?
(I have some experience on hand rolled compilers, but I've not used lex/yacc et. al.)
This is the main func:
int main() {
do {
yyparse();
} while(!feof(yyin));
return 0;
}
Any help would be appreciated. Thanks!
Your grammar recognises a single stmt. Yacc/bison expect the grammar to describe the entire input, so after the statement is recognised, the parser waits for an end-of-input indication. But it doesn't get one, since you typed a second statement. That causes the parser to report a syntax error. But note that it has now read the first token in the second line.
You are calling yyparse() in a loop and not stopping when you get a syntax error return value. So when you call yyparse() again, it will continue where the last one left off, which is just before the second token in the second line. What remains is just a single word, which it then correctly parses.
What you probably should do is write your parser so that it accepts any number of statements, and perhaps so that it does not die when it hits an error. That would look something like this:
%%
prog: %empty
| prog line
line: stmt '\n' { puts("Got a statement"); }
| error '\n' { yyerrok; /* Simple error recovery */ }
...
Note that I print a message for a statement only after I know that the line was correctly parsed. That usually turns out to be less confusing. But the best solution is not use printf's, but rather to use Bison's trace facility, which is as simple as putting -t on the bison command line and setting the global variable yydebug = 1;. See Tracing your parser

Parse error: syntax error, unexpected '"', expecting identifier (T_STRING) or variable (T_VARIABLE) or number (T_NUM_STRING) in C:... on line 22

I want to update a row in a table for my project, I'm copying a syntax I saw somewhere else here however, I think my problem comes when I try updating where ApplicantID is equal to $_SESSION["ID"].
I get this error
Parse error: syntax error, unexpected '"', expecting identifier (T_STRING) or variable (T_VARIABLE) or number (T_NUM_STRING) in C:\xampp\...\InsertPData.php on line 22
here is the php along side the SQL:
<?php
include_once'dbconnect.php';
session_start();
function INSERT()
{
$Name=$_POST['name'];
$Relation=$_POST['Relation'];
$Email=$_POST['Email'];
$Address=$_POST['Address'];
$Postcode=$_POST['Postcode'];
$Mobile_Number=$_POST['Mobile_Number'];
$Home_Number=$_POST['Home_Number'];
$INSERT="UPDATE Applicants
SET ParentName='$Name',
Relationtoapplicant='$Relation',
ParentEmail='$Email',
ParentAddress='$Address',
ParentPostcode='$Postcode',
ParentMobile='$Mobile_Number',
ParentHome='$Home_Number',
WHERE ApplicantID=$_SESSION["ID"] "; #THIS IS LINE 22
$data=mysql_query($INSERT) or die(mysql_error());
if($data)
{
echo "Parents/Gauridan details hav been entered";
}
else print "error";
}
INSERT()
?>
I've already searched for a solution to this but haven't found something where the user is using a session thing. Thank you.
This is why an IDE with syntax highlighting is helpful. StackOverflow uses syntax highlighting on code blocks as well and actually already gives you the answer based on your code:
$INSERT="UPDATE Applicants
WHERE ApplicantID=$_SESSION["ID"] ";
See how ID is suddenly black instead of dark red? That's because you are terminating the string there. The double quotes should either be escaped or replaced with single quotes, like:
$INSERT="UPDATE Applicants
WHERE ApplicantID=$_SESSION[\"ID\"] ";
Or
$INSERT="UPDATE Applicants
WHERE ApplicantID=$_SESSION['ID'] ";
See how the ID bit stays dark red? This is because now your string is not suddenly terminated.
Also, please do not use mysql_ functions anymore. They have been deprecated since 2013 and are currently not even a part of PHP anymore. So if you'd update your PHP to the latest version, this code would not work. On top of that, this code is vulnerable to SQL injection attacks.
Also see Why shouldn't I use mysql_* functions in PHP? and How can I prevent SQL-injection in PHP?.

ANTLR: Use First Token of next statement to terminate the preceeing one

I am having some issues to related to statement termination for an SQL grammar. Currently it supports statement termination through semicolon ';' token. Making this as optional makes the grammar go out of memory. I want to know whether I can match the first token of next statement as a terminator for previous one without consuming it. Here is a snippet of my parser grammar.
unit_statement
: statement SEMICOLON!
;
statement
options{
backtrack=true;
}
:
declare_statement
| assignment_statement
| sql_statement
;
Here the sql_statement rule must end with a semicolon but declare_statement should not. I am using ANTLR 3

Antlr 3 keywords and identifiers colliding

Surprise, I am building an SQL like language parser for a project.
I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.
The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:
line 1:15 mismatched character u'v' expecting 'c'
GRAMMAR
grammar Select;
options {
language=Python;
}
eval returns [value]
: field EOF
;
field returns [value]
: fieldsegments {print $field.text}
;
fieldsegments
: fieldsegment (DOT (fieldsegment))*
;
fieldsegment
: ICHAR+ (USCORE ICHAR+)*
;
WS : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};
ICHAR : ('a'..'z'|'A'..'Z');
PCT_CONTAINS : 'pct_contains';
USCORE : '_';
DOT : '.';
I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.
Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.
What am I missing or doing completely wrong?
Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like
"abc
_ abc"
which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.