error recovery in byacc/j and jflex using error token like in yacc - yacc

i am writing a compiler for a small language using byacc/j and jflex. i have no problem in finding first error in a given input file. the problem is i cant find more errors. first i used to use yacc and lex and i used special symbol 'error' token at the end of some grammar rules which was built in yacc and i could use 'yyerrok' to simply continue parsing and finding more errors but , in byacc/j i cant find something like that and yyerrok does not work and byacc/j does not recognize that. any suggestions to find more than one error in byacc/j ? or is there ' error ' and 'yyerrok' in byacc/j ?

The only thing that yyerrok does is reset the count of tokens since the last error notification. Yacc parsers suppress error messages in the first three tokens after an error recovery, to prevent cascading error messages.
Using yyerrok -- or setting yyerrflag to 0 -- indicates that error recovery was successful and that error messages should now be produced. It does not have any other effect: with or without yyerrok, parsing will continue.
yyerrok is a C macro, and Java doesn't have macros. So apparently it was dropped from the Java interface. But yyerrflag exists as a parser class member and you should be able to just set it to zero in a parser action.

Related

Bison: Syntax Error processing, unexpected and undefined<token>

I want to process undefined and unexpected token error in yyerror func (or maybe by another func if it's possible)
for example, i get a error message from Bison
...
LAC: checking lookahead EXECSQL: S4
Error: popping nterm component_list ()
Stack now 0
Cleanup: discarding lookahead token $undefined ()
Stack now 0
ERRSTAT = "%X0000002C"
But I want to print which token hasn't been founded and the line number. Is it possible to implement it in Bison and how?
The special token $undefined is reported when yylex returns a token number which doesn't appear in any parser rule. Most of the time, that's the result of the lexer fallback rule:
. { return yytext[0]; }
But it can also happen if you declare a token in your parser file, and the lexer returns that token, but the token is never actually used in any rule.
Unused tokens don't have names, in the sense that the array of names which Bison includes in your parser doesn't include unused tokens, and so there's no way to look up what the token name originally was. You can, however, often get the token number from the variable yychar. If that number is greater than 0 and less than 256, then the token is probably a single-character token, and you could use that to print an additional error message. However, there's no simple way to modify the error message generated by Bison's verbose error messages; if you're using that feature, you'll still see the invalid token message.
In order to print line numbers, you only need to enable line number counting in the lexical scanner, using
%option yylineno
in your Flex (.l) file. Then you can print the value of yylineno in yyerror. (If you're using a "pure" (reentrant) scanner, then yylineno will be in the scanner_t object. In the normal use case where that object is an extra parser argument, it will also be available inside yyerror.)
I know that the above is a bit confusing because there are a lot of different code-generation options with slightly different behaviours. You didn't specify the particular options you're using, so the answer is a bit generic.

How to get concise syntax error messages from grako/TatSu

If the input to a grako/tatsu generated parser has a syntax error, such as 3 + / 3 to the calc.py examples, one gets a long list of Python calling sequences in addition to the relevant
3 + / 3
^
I could use try - except constructions but then I lose the relevant part of the error message as well.
I would like to use grako/tatsu to parse grammar rules for a rule compiler and I appreciate the possibility of separating the syntax and semantics in a clean way. The users would be quite annoyed of the excessive error messages. Is there a way for clean error messages?
This should be the same as in any Python program. If you let the exception escape main(), then a stack trace will be printed. Instead, you can write:
try:
do_parse()
except Exception as e:
print(str(e))

How to serialize/deserialize YAML::Binary?

UPDATE
It now seems that anything put in a vector breaks. I have tried char and u/int/8/16/32 and they all generate some kind of error. I'm a bit perplexed. There may be an error in my code, but I'm not sure what the YAML should look like, so I'm probably not doing a very a very good job of looking for when the data becomes incorrect.
Is YAML::Binary from Yaml-Cpp finished yet? I've tried serializing my data as int's, but Yaml-Cpp seems to be confused about ints and chars, and this generally never works. Instead, now I'm trying to use Yaml::Binary, but I get an error on the other side when I try to recover the YAML::Binary node on the other end. Specifically, this chunk fails:
3: 0\n6: WAUAAAAAAABYBQAAAAAAAP////84UBspV0FVQUFBQUFBQUJZQlFBQUFBQUFBUC8vLy84NFVCc3BWMEZWUVVGQlFVRkJRVUpaUWxGQlFVRkJRVUZCVUM4dkx5ODRORlZDYzNCV01FWldVVlZHUWxGVlJrSlJWVXBhVVd4R1FsRlZSa0pSVlVaQ1ZVTTRka3g1T0RST1JsWkRZek5DVjAxRldsZFZWbFpIVVd4R1ZsSnJTbEpXVlhCaFZWZDRSMUZzUmxaU2EwcFNWbFZhUTFaVlRUUmthM2cxVDBSU1QxSnNXa1JaZWs1RFZqQXhSbGRzWkZaV2JGcElWVmQ0UjFac1NuSlRiRXBYVmxoQ2FGWldaRFJTTVVaelVteGFVMkV3Y0ZOV2JGWmhVVEZhVmxSVVVtdGhNMmN4VkRCU1UxUXhTbk5YYTFKYVpXczFSRlpxUVhoU2JHUnpXa1phVjJKR2NFbFdWbVEwVWpGYWMxTnVTbFJpUlhCWVZteG9RMkZHV2xkYVJGSlRUVlZhZWxWdGVHRlZNa1YzWTBaT1YySkdXbWhWVkVaaFZteFNWVlZ0ZEdoTk1tTjRWa1JDVTFVeFVYaFRiazVZWVRGS1lWcFhjekZTUmxweFVWaG9VMkpIVW5wWGExcGhWakpLUjJORmJGZFdiVkV3VldwR1lXTXhUblZUYkZKcFVsaENXVlp0ZUc5Uk1rWkhWMnhrWVZKR1NsUlVWbFpoWld4V2RHVkhSbFpOYTFZeldUQmFUMVl5U2tkWGJXaFdWa1ZhYUZadGVGTldWbFowWkVkb1RrMXRUalJXYTFKRFZURlZlRlZZYUZSaWF6VlpXVlJHUzFsV2NGaGpla1pUVW14d2VGVldhRzlWTWtwSVZXNXdXR0V4Y0doV2FrcExVakpPUm1KR1pGZGlWa1YzVmxkd1IxbFhUWGhVYmxaVVlrWktjRlZzYUVOWFZscDBaVWM1VWsxcldraFdNbmhyV1ZaS1IxTnNVbFZXYkZwb1dsZDRWMlJIVmtoU2JGcE9ZVEZaZWxkVVFtRlVNVmw1VTJ0a1dHSlhhRmRXYTFaaFlVWmFkR1ZHVGxkV2JGb3dXa1ZrYjFSck1YUlVhbEpYWVRGS1JGWlVSbFpsUmxaWllVWlNhV0Y2VmxwWFZsSkhVekZzVjJOR2FHcGxhMXBVVlcxNGQyVkdWbGRoUnpsV1RXdHdTVlpYTlhkWFIwVjRZMGRvVjJGcmNFeFZha3BQVW0xS1IxcEdaR2xXYTFZelZteGtkMUl4YkZoVVdHaFZZbXhhVlZscldrdGpSbFp6WVVWT1dGWnNjREJhVldNMVZXc3hjbGRyYUZkTmJtaHlWMVphUzFJeA==\n7: /USER_NAMES/src/sockets/rsc/atkrscs.tar.gz\n1: 0\n4: 0\n5: 651633\n2: 0
As:
terminate called after throwing an instance of 'YAML::ParserException'
what(): yaml-cpp: error at line 7, column 7: unknown escape character:
What should I do? Is there another way to send/receive binary? Did I do something wrong?

debugging JavaScript runtime syntax errors

Short: I am looking for a way to get the text of the script that was evaluated and caused a syntax error from within the context of window.onerror.
Long:
The full scenario includes a phone gap application and the PushNotifications plugins.
When a push message is sent to the device a javascript error is caught using window.onerror.
with the text "SyntaxtError: Expected token '}'"
the reported line number is 1 (is it is usually when dealing with EVALuated code.
The way the plugin executs its code is by using:
NSString * jsCallBack = [NSString stringWithFormat:#"%#(%#);", self.callback, jsonStr];
[self.webView stringByEvaluatingJavaScriptFromString:jsCallBack];
I belive but not 100% sure that this is the code PhoneGap Build are pushing
more code can be seen in here https://github.com/phonegap-build/PushPlugin/blob/master/src/ios/PushPlugin.m#L177
the self.callback is a string passed by me to the plugin and jsonStr is (supposed to be) an object describing the push message.
when I tried to pass as the parameter that ends up being self.callback the string alert('a');// then I did get the alert and no syntax error. ad now I am trying to understand what does jsonStr gets evaluated to so that maybe I can find a way around it or figure out if its my fault somehow (maybe for the content I am sending in the push notification....)
I also tried to look at the last item of the $('script') collection of the document hopeing that maybe stringByEvaluatingJavaScriptFromString generates a new script block but that does not seem to be the case.
further more in the window.onerror I also tried to get the caller
using var c=window.onerror.caller||window.onerror.arguments.caller; but this returns undefined.
As I stated before - I am looking for ideas on how to determine what exactly is causing the syntax error possibly by getting a hold of the entire block of script being evaluated when the syntax error happened.

ANTLR reports error and I think it should be able to resolve input with backtracking

I have a simple grammar that works for the most part, but at one place it reports error and I think it shouldn't, because it can be resolved using backtracking.
Here is the portion that is problematic.
command: object message_chain;
object: ID;
message_chain: unary_message_chain keyword_message?
| binary_message_chain keyword_message?
| keyword_message;
unary_message_chain: unary_message+;
binary_message_chain: binary_message+;
unary_message: ID;
binary_message: BINARY_OPERATOR object;
keyword_message: (ID ':' object)+;
This is simplified version, object is more complex (it can be result of other command, raw value and so on, but that part works fine). Problem is in message_chain, in first alternative. For input like obj unary1 unary2 it works fine, but for intput like obj unary1 unary2 keyword1:obj2 is trys to match keyword1 as unary message and fails when it reaches :. I would think that it this situation parser would backtrack and figure that there is : and recognize that that is keyword message.
If I make keyword message non-optional it works fine, but I need keyword message to be optional.
Parser finds keyword message if it is in second alternative (binary_message) and third alternative (just keyword_message). So something like this gives good results: 1 + 2 + 3 Keyword1:Value
What am I missing? Backtracking is set to true in options and it works fine in other cases in the same grammar.
Thanks.
This is not really a case for PEG-style backtracking, because upon failure that returns to decision points in uncompleted derivations only. For input obj unary1 unary2 keyword1:obj2, with a single token lookahead, keyword1 could be consumed by unary_message_chain. The failure may not occur before keyword_message, and next to be tried would be the second alternative of message_chain, i.e. binary_message_chain, thus missing the correct parse.
However as this grammar is LL(2), it should be possible to extend lookahead to avoid consuming keyword1 from within unary_message_chain. Have you tried explicitly setting k=2, without backtracking?