I'm using Parsekit for XCode but this problem may well apply to most parser generators. I parse a script with a gammer and it works fine. However I'm running into problems with the next step. Consider the grmmer:
#start = line+;
line = tree;
tree = 'tree' fruits branches;
fruits = 'with' 'fruits' Number;
branches = 'with' 'branches' Number;
And the script to pass:
tree with fruits 8 with branches 12
If I then implemenet methods to match, fruits and branches will be matched once, this is expected.
However tree will be matched twice, why? This stops me from building the parse tree I want.
Thanks for any help!
Developer of ParseKit here. I think you might actually be asking the same question as this:
parsekit given unexpected calls to selectors
If you read through the comments I made on that question, I think you will find the answer. Let me know if not.
Related
I didn't touch Prolog since high-school, and even though I've tried to find the info, it didn't help. Below is the example that has to illustrate my problem:
%% everybody():- [dana, cody, bess, abby].
%% Everybody = [dana, cody, bess, abby].
likes(dana, cody).
hates(bess, dana).
hates(cody, abby).
hates(X, Y):- \+ likes(X, Y).
likes_somebody(_, []):- fail.
likes_somebody(X, [girl | others]):-
likes(X, girl) ; likes_somebody(X, others).
likes_everybody(_, []):- true.
likes_everybody(X, [girl | others]):-
likes(X, girl) , likes_everybody(X, others).
maplist(likes_somebody, [dana, cody, bess, abby], [dana, cody, bess, abby]).
How do I declare everybody to just be the list of girls? The commented lines are those which I've tried, but I've got bizarre error messages back.
This is the tutorial I followed more or less so far. I'm using GProlog, if it makes any difference. Sorry for such a basic question. GProlog's manual doesn't deal with language syntax, but I've certainly looked there. As an aside, I would be grateful for information on where to look for language documentation (as opposed to implementation documentation).
Every variable in Prolog must begin with an uppercase letter. So for starters, you want Everybody, not everybody.
Second problem, variables in Prolog are not assignables. So probably what you want to do is make a fact and use that instead:
everybody([dana, cody, bess, abby]).
Your bottom line of code is actually a fact definition and will attempt to overwrite maplist/3. What you probably want to do is put everything above that line into a file (say, called likes.pl) and then consult it ([likes].). Then you can run a query like this:
?- everybody(Everybody), maplist(likes_somebody, Everybody, Everybody).
This won't work, because likes_somebody/2 processes a list in the second argument. The predicate you have for likes_somebody/2 could be written:
likes_somebody(_, []).
but still won't mean much. It simply unifies anything with the empty list:
?- likes_somebody(chicken_tacos, []).
true.
You really need a predicate to tell you if someone is a girl, like this:
girl(dana).
girl(cody).
girl(bess).
girl(abby).
Then you could do what I think you're trying to do, which is something closer to this:
likes_somebody(X) :- girl(X).
Then the maplist construction would work like this:
everybody(Everybody), maplist(likes_somebody, Everybody).
Which would simply return true. You could simplify and eliminate everybody/1 by instead using findall(Girl, girl(X), Everybody) but it's getting weird.
You're trying to do list processing with likes_everybody/2, but it's broken because girl is literally girl, not a variable, and others is literally others, not a list of some kind that could be the tail of another list.
I think you still have some old ideas you need to cleanse. Read some more, write some more, and your code will start to make a lot more sense.
I've a fairly simple question about ParseKit and parsing timestamps... how do I go about forcing the symbolic-nature of a dot/period.
For example, if I am trying to parse 2008-01-25, I could use something like date = /\d{4}/ '-' /\d{2}/ '-' /\d{2}/. In fact, there is a date.grammar shipped with ParseKit that does exactly this (interestingly enough, though, the provided grammar doesn't work in the DemoApp unless you add #symbolState='-';, but I digress...)
However, what do I do if I want to parse a date with dots in it... for example, 2008.01.25 or 2008-01-25-12.34.45. I've tried added '.' to the #symbolState directive but it just keeps getting ignored. Note that I am relying on the DemoApp to test my grammars at the moment... not sure if that makes any difference.
Any thoughts would be much appreciated.
Developer of ParseKit here.
First, thanks for the heads up on the bug in the date.grammar file. I have fixed it.
As for your main question, I'm pretty sure what you are trying was not possible with ParseKit until now.
That is, ParseKit's tokenizer (PKTokenizer) was not able to produce only whole number Number tokens. Numbers were always tokenized as floating point which means it was impossible to parse input like 3.14 as three separate tokens 3 (Number) . (Symbol) 14 (Number). Rather it would always be tokenized as 3.14.
Good news: I've added this capability with a new method:
-[PKNumberState allowsFloatingPoint]
which defaults to YES.
And I added a matching Tokenizer Directive which you can use in your ParseKit Grammars like:
#allowsFloatingPoint = NO;
NOTE you'll need to checkout the latest HEAD of trunk on Google Code to see this feature.
So, here's an example date grammar which does roughly what you were asking for with the new feature:
#symbolState = '.';
#allowsFloatingPoint = NO;
#start = date;
date = year dot month dot day;
year = /\d{4}/;
month = /\d{2}/;
day = /\d{2}/;
dot = '.';
I'm wondering if a simpler idea might be to get parseKit to simply parse the date as a string, and then hand it off to the NSDate::dateWithNaturalLanguageString:locale: orNSDate::dateWithNaturalLanguageString: for processing.
From Parsekit: how to match individual quote characters?
If you define a parser:
#start = int;
int = /[+-]?[0-9]+/
Unfortunately it isn't going to be parsing any integers prefixed with a "+", unless you include:
#numberState = "+" // at the top.
In the number parse above, the "Symbol" default parser wasn't even mentioned, yet it is still active and overrides user defined parsers.
Okay so with numbers you can still fix it by adding the directive. What if you're trying to create a parser for "++"? I haven't found any directive that can make the following parser work.
#start = plusplus;
plusplus = "++";
The effects of default parsers on the user parser seems so arbitrary. Why can't I parse "++"?
Is it possible to just turn off default Parsers altogether? They seem to get in the way if I'm not doing something common.
Or maybe I've got it all wrong.
EDIT:
I've found a parser that would parse plus plus:
#start = plusplus;
plusplus = plus plus;
plus = "+";
I am guessing the answer is: the literal symbols defined in your parser cannot overlap between default parsers; It must be contained completely by at least once of them.
Developer of ParseKit here.
I have a few responses.
I think you'll find the ParseKit API highly elegant and sensible, the more you learn. Keep in mind that I'm not tooting my own horn by saying that. Although I built ParseKit, I did not design the ParseKit API. Rather, the design of ParseKit is based almost entirely on the designs found in Steven Metsker's Building Parsers In Java. I highly recommend you checkout the book if you want to deeply understand ParseKit. Plus it's a fantastic book about parsing in general.
You're confusing Tokenizer States with Parsers. They are two distinct things, but the details are more complex than I can answer here. Again, I recommend Metsker's book.
In the course of answering your question, I did find a small bug in ParseKit. Thanks! However, it was not affecting your outcome described above as you were not using the correct grammar to get the outcome it seems you were looking for. You'll need to update your source code from The Google Code Project now, or else my advice below will not work for you.
Now to answer your question.
I think you are looking for a grammar which both recognizes ++ as a single multi-char Symbol token and also recognizes numbers with leading + chars as explicitly-positive numbers rather than a + Symbol token followed by a Number token.
The correct grammar I believe you are looking for is something like this:
#symbols = '++'; // declare ++ as a multi-char symbol
#numberState = '+'; // allow explicitly-positive numbers
#start = (Number|Symbol)*;
Input like this:
++ +1 -2 + 3 ++
Will be tokenized like so:
[++, +1, -2, +, 3, ++]++/+1/-2/+/3/++^
Two reminders:
Again, you will need to update your source code now to see this work correctly. I had to fix a bug in this case.
This stuff is tricky, and I recommend reading Metsker's book to fully understand how ParseKit works.
For the following rule :
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* switchDefaultLabel? switchCaseLabel*)
;
I got an error:"rule switchBlockLabels has non-LL descision due to recursive rule invocations reachable from alts 1,2".And I tried to add syntactic predicate to solve this problem.I read the book "The Definitive ANTLR Reference".And Now I am confused that since there is no alternatives in rule switchBlockLabels,then no decision need to be made on which one to choose.
Is anyone can help me?
Whenever the tree parser stumbles upon, say, 2 switchCaseLabels (and no switchDefaultLabel in the middle), it does not know to which these switchCaseLabels belong. There are 3 possibilities the parser can choose from:
2 switchCaseLabels are matched by the 1st switchCaseLabel*;
2 switchCaseLabels are matched by the 2nd switchCaseLabel*;
1 switchCaseLabel is matched by the 1st switchCaseLabel*, and one by the 2nd switchCaseLabel*.
and since the parser does not like to choose for you, it emits an error.
You need to do something like this instead:
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* (switchDefaultLabel switchCaseLabel*)?)
;
That way, when there are only switchCaseLabels, and no switchDefaultLabel, these switchCaseLabels would be always matched by the first switchCaseLabel*: there is no ambiguity anymore.
I'm going through the ParseKit example and trying to modify it to suit my needs and running into this problem. As soon as I pass in the grammar file to parserFromGrammar:assembler, I get an error:
[__NSArrayM objectAtIndex:]: index 0 beyond bounds for empty array
I thought maybe it was because my grammar files had token names with underscores in them. Does ParseKit support underscores? What would the method name be that gets called back? Aka would the token name "foo_bar" call a method didMatchFoo_bar?
I then took out all the underscored names and it still gives me that error. I'm using the example grammar file from the ParseKit website:
#start = sentence+;
sentence = adjectives 'beer' '.';
adjectives = cold adjective*;
adjective = cold | freezing;
cold = 'cold';
freezing = 'freezing';
Thanks
Developer of ParseKit here. 2 things:
To answer your first question, I believe the answer is YES.
I just tried out the grammar and it seems to work for me. However, I am using the latest version of ParseKit from Google Code (not GitHub. GitHub is out of date. sorry.)
So checkout ParseKit from Google Code here:
https://parsekit.googlecode.com/svn/trunk
And then select the "DebugApp" target and "DebugApp" Executable and run.
In the Xcode project, do a global search for "cold freezing beer". you'll see I've added your example as the default example run in DebugApp. Seems to work ok.