How to find parsing error with ParseKit framework - objective-c

I was wondering if there were a way to get back how far into an assembly a PKParser has parsed before encountering a syntax error.
reference: http://parsekit.com/
I'm using a grammar that basically describes a prefix notation expression language.
For example:
given your standard prefix notation expression grammar and a string "(+ a - b c))"
I'd like to retrieve that [(,+,a] where matched, so I can give the user some idea of where to look to fix their error, but the completeMatchFor and bestMatchFor don't return anything I can use to find this info.
Ideally I'd like to say that a '(' was expected, but it's not necessary for a grammar as simple as what I'm using.
From the book mentioned as the user manual, it seemed as if I would need to create a custom parser for this, but I was hoping that maybe I'd simply missed something in the framework.
Thoughts?

Developer of ParseKit here.
There are two features in ParseKit which can be used to help provide user-readable hints describing parse errors encountered in input.
-[PKParser bestMatchFor:]
The PKTrack class
It sounds like you're aware of the -bestMatchFor: method even if it's not doing what you expect in this case.
I think the PKTrack class will be more helpful here. As described in Metsker's book, PKTrack is exactly like PKSequence except that its subparsers are required, and an error is thrown (with a helpful error message) when all of its subparsers are not matched.
So here's a grammar for your example input:
#start = '(' expr ')' | expr;
expr = ('+' | '-') term term;
term = '(' expr ')' | Word;
Any productions listed contiguously are a Sequence -- but could instead be a Track.
The benefit of changing these Sequences to be Tracks is that an NSException will be thrown with a human-readable parse error message if the input doesn't match. The downside is that you must now wrap all usages of your factory-generated parser in a try/catch block to catch these Track exceptions.
The problem currently (or before now, at least) is that the PKParserFactory never produced a parser using Tracks. Instead, it would always use Sequences.
So I've just added a new option in head of trunk at Google Code (you'll need to udpate).
#define USE_TRACK 0
in
PKParserFactory.m
It's 0 by default. If you change this define to 1, Tracks will be used instead of Sequences. So given the grammar above and invalid input like this:
(+ a - b c))
and this client code:
NSString *g = // fetch grammar above
PKParser *p = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
NSString *s = #"(+ a - b c))";
#try {
PKAssembly *res = [p parse:s];
NSLog(#"res %#", res);
}
#catch (NSException *exception) {
NSLog(#"Parse Error:%#", exception);
}
you will get a nice-ish human-readable error:
Parse Error:
After : ( + a
Expected : Alternation (term)
Found : -
Hope that helps.

I'm wrestling with this issue too. In order for -bestMatchFor: to be useful in identifying error conditions, there should be methods in PKAssembly's public interface indicating if there are more tokens/characters to be parsed. -completeMatchFor: is able to determine error state because it has access to the private -hasMore method. Perhaps PKAssembly's -hasMore method should be public.
I looked at PKTrack but since I want to handle errors programmatically, it wasn't useful to me.
My conclusion is I either write my own custom Track parser or I alter the framework and expose -hasMore. Are there other ways to handle errors?
Until I figure out a better way to detect errors, I've added the following to the file containing the implementation of my custom parser:
#interface PKAssembly ()
- (BOOL)hasMore;
- (id)peek;
#end
#implementation PMParser
...
#end
In my parse method:
PKAssembly* a = [PKTokenAssembly assemblyWithString:s];
PKAssembly* best = [self bestMatchFor:a];
PMParseNode* node = nil;
BOOL error = NO;
NSUInteger errorOffset = 0;
if (best == nil) // Anything recognized?
{
error = YES;
}
else
{
if ([best hasMore]) // Partial recognition?
{
PKToken* t = [best peek];
error = YES;
errorOffset = t.offset;
}
node = [best pop];
}
If an error occurred, errorOffset will contained the location of the unrecognized token.

Related

Yacc/bison: what's wrong with my syntax equations?

I'm writing a "compiler" of sorts: it reads a description of a game (with rooms, characters, things, etc.) Think of it as a visual version of an Adventure-style game, but with much simpler problems.
When I run my "compiler" I'm getting a syntax error on my input, and I can't figure out why. Here's the relevant section of my yacc input:
character
: char-head general-text character-insides { PopChoices(); }
;
character-insides
: LEFTBRACKET options RIGHTBRACKET
;
char-head
: char-namesWT opt-imgsWT char-desc opt-cond
;
char-desc
: general-text { SetText($1); }
;
char-namesWT
: DOTC ID WORD { AddCharacter($3, $2); expect(EXP_TEXT); }
;
opt-cond
: %empty
| condition
;
condition
: condition-reason condition-main general-text
{ AddCondition($1, $2, $3); }
;
condition-reason
: DOTU { $$ = 'u'; }
| DOTV { $$ = 'v'; }
;
condition-main
: money-conditionWT
| have-conditionWT
| moves-conditionWT
| flag-conditionWT
;
have-conditionWT
: PERCENT_SLASH opt-bang ID
{ $$ = MkCondID($1, $2, $3) ; expect(EXP_TEXT); }
;
opt-bang
: %empty { $$ = TRUE; }
| BANG { $$ = FALSE; }
;
ID: WORD
;
Things in all caps are terminal symbols, things in lower or mixed case are non-terminals. If a non-terminal ends in WT, then it "wants text". That is, it expects that what comes after it may be arbitrary text.
Background: I have written my own token recognizer in C++ because(*) I want the syntax to be able to change the way the lexer's behavior. Two types of tokens should be matched only when the syntax expects them: FILENAME (with slashes and other non-alphameric characters) and TEXT, which means "all the text from here to the end of the line" (but not starting with certain keywords).
The function "expect" tells the lexer when to look for these two symbols. The expectation is reset to EXP_NORMAL after each token is returned.
I have added code to yylex that prints out the tokens as it recognizes them, and it looks to me like the tokenizer is working properly -- returning the tokens I expect.
(*) Also because I want to be able to ask the tokenizer for the column where the error occurred, and get the contents of the line being scanned at the time so I can print out a more useful error message.
Here is the relevant part of the input:
.c Wendy wendy
OK, now you caught me, what do you want to do with me?
.u %/lasso You won't catch me like that.
[
Here is the last part of the debugging output from yylex:
token: 262: DOTC/
token: 289: WORD/Wendy
token: 289: WORD/wendy
token: 292: TEXT/OK, now you caught me, what do you want to do with me?
token: 286: DOTU/
token: 274: PERCENT_SLASH/%/
token: 289: WORD/lasso
token: 292: TEXT/You won't catch me like that.
token: 269: LEFTBRACKET/
here's my error message:
: line 124, columns 3-4: syntax error, unexpected LEFTBRACKET, expecting TEXT
[
To help you understand the equations above, here is the relevant part of the description of the input syntax that I wrote the yacc code from.
// Character:
// .c id charactername,[imagename,[animationname]]
// description-text
// .u condition on the character being usable [optional]
// .v condition on the character being visible [optional]
// [
// (options)
// ]
// Conditions:
// %$[-]n Must [not] have at least n dollars
// %/[-]name Must [not] have named thing
// %t-nnn At/before specified number of moves
// %t+nnn At/after specified number of moves
// %#[-]name named flag must [not] be set
// Condition-char: $, /, t, or #, as described above
//
// Condition:
// % condition-char (identifier/int) ['/' text-if-fail ]
// description-text: Can be either on-line text or multi-line text
// On-line text is the rest of the line
brackets mark optional non-terminals, but a bracket standing alone (represented by LEFTBRACKET and RIGHTBRACKET in the yacc) is an actual token, e.g.
// [
// (options)
// ]
above.
What am I doing wrong?
To debug parsing problems in your grammar, you need to understand the shift/reduce machine that yacc/bison produces (described in the .output file produced with the -v option), and you need to look at the trail of states that the parser goes through to reach the problem you see.
To enable debugging code in the parser (which can print the states and the shift and reduce actions as they occur), you need to compile with -DYYDEBUG or put #define YYDEBUG 1 in the top of your grammar file. The debugging code is controlled by the global variable yydebug -- set to non-zero to turn on the trace and zero to turn it off. I often use the following in main:
#ifdef YYDEBUG
extern int yydebug;
if (char *p = getenv("YYDEBUG"))
yydebug = atoi(p);
#endif
Then you can include -DYYDEBUG in your compiler flags for debug builds and turn on the debugging code by something like setenv YYDEBUG 1 to set the envvar prior to running your program.
I suppose your syntax error message was generated by bison. What is striking is that it claims to have found a LEFTBRACKET when it expects a [. Naively, you might expect it to be satisfied with the LEFTBRACKET it found, but of course bison knows nothing about LEFTBRACKET except its numeric value, which will be some integer larger than 256.
The only reason bison might expect [ is if your grammar includes the terminal '['. But since your scanner seems to return LEFTBRACKET when it sees a [, the parser will never see '['.

What could be a reason for `_localctx` being null in an antlr4 semantic predicate?

I'm using list labels to gather tokens and semantic predicates to validate sequences in my parser grammar. E.g.
line
:
(text+=WORD | text+=NUMBER)+ ((BLANK | SKIP)+ (text+=WORD | text+=NUMBER)+)+
{Parser.validateContext(_localctx)}?
(BLANK | SKIP)*
;
where
WORD: [\u0021-\u002F\u003A-\u007E]+; // printable ASCII characters (excluding SP and numbers)
NUMBER: [\u0030-\u0039]+; // printable ASCII number characters
BLANK: '\u0020';
SKIP: '\u0020\u0020' | '\t'; // two SPs or a HT symbol
The part of Parser.validateContext used to validate the line rule would be implemented like this
private static final boolean validateContext(ParserRuleContext context) {
//.. other contexts
if(context instanceof LineContext)
return "<reference-sequence>".equals(Parser.joinTokens(((LineContext) context).text, " "));
return false;}
where Parser.joinTokens is defined as
private static String joinTokens(java.util.List<org.antlr.v4.runtime.Token> tokens, String delimiter) {
StringBuilder builder = new StringBuilder();
int i = 0, n;
if((n = tokens.size()) == 0) return "";
builder.append(tokens.get(0).getText());
while(++i < n) builder.append(delimiter + tokens.get(i).getText());
return builder.toString();}
Both are put in a #parser::members clause a the beginning of the grammar file.
My problem is this: sometimes the _localctx reference is null and I receive "no viable alternative" errors. These are probably caused because the failing predicate guards the respective rule and no alternative is given.
Is there a reason–potentially an error on my part–why _localctx would be null?
UPDATE: The answer to this question seems to suggest that semantic predicates are also called during prediction. Maybe during prediction no context is created and _localctx is set to null.
The semantics of _localctx in a predicate are not defined. Allowable behavior includes, but is not limited to the following (and may change during any release):
Failing to compile (no identifier with that name)
Using the wrong context object
Not having a context object (null)
To reference the context of the current rule from within a predicate, you need to use $ctx instead.
Note that the same applies for rule parameters, locals, and/or return values which are used in a predicate. For example, the parameter a cannot be referenced as a, but must instead be $a.

Objective-C compile error: Invalid operands to binary expression

-(void)updateCharacterStatsForArmor:(RKArmor *)armor withWeapons:(RKWeapon *)weapon withHealthEffect:(int)healtheffect
{
if (armor != nil){
self.character.health = self.character.health - self.character.armor.health + armor.health;
self.character.armor = armor;
}
else if (weapon != nil){
// The problematic line:
self.character.damage = self.character.damage - self.character.weapon.damage + weapon.damage;
self.character.weapon = weapon;
}
else if (healtheffect != 0){
self.character.health = self.character.health + healtheffect;
}
else {
self.character.health = self.character.health + self.character.armor.health;
self.character.damage = *(self.character.damage + self.character.weapon.damage);
}
}
#end
The line with the error is marked in the code snippet. The error says invalid operand to binary expression int int*.
Would It be best to restart the whole thing?
You probably defined one of the damage properties you're using as an int* instead of int. Check your character and weapon classes for that. I'd suspect self.character.damage.
If the line you indicated really is the problem, it looks like you've probably declared the damage property of the character class to return int * instead of int. It's easy to make that mistake because properties that point to objects are always pointers. int, however, is not an object type, so there's usually not need to store a pointer to one in a property.
The line you say is a problem is:
self.character.damage = self.character.damage - self.character.weapon.damage + weapon.damage;
But later on you have:
self.character.damage = *(self.character.damage + self.character.weapon.damage);
It looks like this second line is some kind of attempt to avoid a compiler error, but it doesn't seem to make a lot of sense. You'd only dereference the result of the addition if you were doing pointer arithmetic, but that doesn't make much sense for a property called damage.
To fix all this, take a look at your character class declaration. You'll probably see something like:
#property(...) int *damage;
Remove that * and make sure the attribute is assign, like this:
#property(assign) int damage;
Also, it's not clear how pirates are involved. Perhaps ye have been a mite hasty in makin' yer query? Arrr!

Null Pointer Exception - ANTLR TreeWalker

I keep getting a NullPoiterException in my TreeWalker but I can't seem to find out why.
I can't post the whole grammar, cause it's far too long.
This is the rule in the treeWalker where antlrWorks says the problem is:
collection_name returns [MyType value]
: ID { $value = (MyType) database.get($collection_name.text); }
;
Note that database is a HashMap.
Thank you!
I can't post the whole grammar, cause it's far too long.
The following is more "readable" and does exactly the same as your original rule:
collection_name returns [MyType value]
: ID { $value = (MyType) database.get($ID.text); }
;
Perhaps do some sanity checks:
collection_name returns [MyType value]
: ID
{
Object v = database.get($ID.text);
if(v == null) {
throw new RuntimeException($ID.text + " unknown in database!");
}
$value = (MyType) v;
}
;
EDIT
As you already found out, accessing the .text attribute of a rule is not possible in a tree grammar (only in a parser grammar). In tree grammars, every rule is of type Tree and knows a .start and .end attributes instead. Tokens can be accessed the same in both parser- and tree-grammars. So $ID.text works okay.

Parse VCALENDAR (ics) with Objective-C

I'm looking for an easy way to parse VCALENDAR data with objective-c. Specifically all I am concerned with is the FREEBUSY data (See below):
BEGIN:VCALENDAR
VERSION:2.0
METHOD:REPLY
PRODID:-//CALENDARSERVER.ORG//NONSGML Version 1//EN
BEGIN:VFREEBUSY
UID:XYZ-DONT-CARE
DTSTART:20090605T070000Z
DTEND:20090606T070000Z
ATTENDEE:/principals/__uids__/ABC1234-53D8-4079-8392-01274F97F5E1/
DTSTAMP:20090605T075430Z
FREEBUSY;FBTYPE=BUSY:20090605T170000Z/20090605T200000Z,20090605T223000Z/20
090606T003000Z
FREEBUSY;FBTYPE=BUSY-UNAVAILABLE:20090605T070000Z/20090605T150000Z,2009060
6T010000Z/20090606T070000Z
ORGANIZER:/principals/__uids__/ABC1234-53D8-4079-8392-01274F97F5E1/
END:VFREEBUSY
END:VCALENDAR
I've tried parsing it by using componentsSeparatedByString:#"\n", but there is a \n in part of the FREEBUSY data, causing it to not parse correctly.
Is there something easy that I'm missing?
The \n in the middle of FREEBUSY data is a part of the iCalendar spec; according to RFC 2445, the newline followed by a space is the correct way to split long lines, so you'll probably see a lot of this in scanning FREEBUSY data.
As Nathan suggests, an NSScanner may be all you need if the data you're expecting will be reasonably consistent. There are a number of vagaries in iCalendar, though, so I often find myself using libical to parse ics info. An quick-and-dirty example of parsing this data using libical:
NSString *caldata = #"BEGIN:VCALENDAR\nVERS....etc";
icalcomponent *root = icalparser_parse_string([caldata cStringUsingEncoding:NSUTF8StringEncoding]);
if (root) {
icalcomponent *c = icalcomponent_get_first_component(root, ICAL_VFREEBUSY_COMPONENT);
while (c) {
icalproperty *p = icalcomponent_get_first_property(c, ICAL_FREEBUSY_PROPERTY);
while (p) {
icalvalue *v = icalproperty_get_value(p);
// This gives: 20090605T170000Z/20090605T200000Z
// (note that stringWithCString is deprecated)
NSLog(#"FREEBUSY Value: %#", [NSString stringWithCString:icalvalue_as_ical_string(v)]);
icalparameter *m = icalproperty_get_first_parameter(p, ICAL_FBTYPE_PARAMETER);
while (m) {
// This gives: FBTYPE=BUSY
NSLog(#"Parameter: %#", [NSString stringWithCString:icalparameter_as_ical_string(m)]);
m = icalproperty_get_next_parameter(p, ICAL_FBTYPE_PARAMETER);
}
p = icalcomponent_get_next_property(c, ICAL_FREEBUSY_PROPERTY);
}
c = icalcomponent_get_next_component(root, ICAL_VFREEBUSY_COMPONENT);
}
icalcomponent_free(root);
}
Documentation for libical is in the project download itself (see UsingLibical.txt). There's also this lovely tutorial on shipping libical in your application bundle.
Take a look at NSScanner.