String matching problem (can I prioritize?) - antlr

I have a (badly specified) requirement that I recognize certain keywords, but there is also provision for 'any string' ...
For instance, in the input "let's have a " I have to handle == "beer", == "curry" and == anything else at all (in theory, the keywords beer & curry have priority over all other strings).
When I try to define this, of course, I get
Decision can match input such as "'curry" using multiple alternatives: 2, 3
As a result, alternative(s) 3 were disabled for that input
I imagine this is a st00pid n00b FAQ, but don't see an obvious answer. Any help gratefully received ...

You need to apply some of the grammar disambiguation techniques that you are either learning (if this is homework). Generally speaking you add an additional rule that disambiguates the grammar. Another antlr specific thing you can do is add an action to the rule that will handle the differences.I might be able to help more if you post the antlr code in question.

Related

Use of math in ALFA

How to get a rule like that working:
rule adminCanViewAllExams {
condition (integerOneAndOnly(my.company.attributes.subject.rights) & 0x00000040) == 0
permit
}
Syntax highlighter complains it doesn't know those items:
& (This is a binary math operation)
0x00000040 (this is the hexadecimal representation of an integer)
EDIT
(adding OP's comment inside the question)
I want to keep as much as possible in my current application. Meaning, I don't want to change a lot in my database model. I just want to implement the PEP and PDP part new. So, currently the rights of the user are stored in a Long. Each bit in the number represents a right. To get the right we do a binary &-operation which masks the other bits in the Long. We might redesign this part, but it's still good to know how far the support for mathematic operations goes
XACML does not support bitwise logic. It can do boolean logic (AND and OR) but that's about it.
To achieve what you are looking for, you could use a Policy Information Point which would take in my.company.attributes.subject.rights and 0x00000040. It would return an attribute called allowed.
Alternatively, you can extend XACML (and ALFA) to add missing datatypes and functions. But I would recommend going for human-readable policies.

How do you control a range for type safety?

Imagine you have a function that converts ints to roman string:
public String roman(int)
Only numbers from 1 to 3999 (inclusive) are valid for conversion.
So what do you do if someone passes 4000 in any OO language?
raise an exception
return “” or some other special string
write an assert
…
Number 1: raise an exception. That's what ArgumentOutOfRangeException is for (at least in .NET):
if (intToConvert >= 4000)
{
throw new ArgumentOutOfRangeException("intToConvert ", "Only numbers 1-3000 are valid for conversion.");
}
I find the validation topic very interesting in general. In my opinion option 2 (returning a special value) is not a good one, since you are forcing the client to do if/case to check for the returned value and that code must be repeated everywhere. Also, unlike exceptions that propagate through the calling stack, in this scenario the caller is almost always the one that has to handle that special value.
In the context of OOP raising an exception or having an assertion is, IMO, a more elegant way to cope with it. However i find that inlining verification code in every method doesn't scale well for some reasons:
Many times your validation logic ends up being greater than the method logic itself, so you end up cluttering your code with things that are not entirely relevant to it.
There is no proper validation code reuse (e.g. range validation, e-mail validation, etc).
This one depends on your tastes, but you will be doing defensive programming.
Some years ago I attended to a talk about validators (a similar talk slide's are here. The document explaining it used to be in http://www.caesarsystems.com/resources/caesarsystems/files/Extreme_Validation.pdf but now its a 404 :( ) and totally like the concept. IMHO having a validation framework that adopts the OO philosophy is the way to go. In case you want to read about it I've written a couple of posts about it here and here (disclaimer: the posts are part of the blog of the company I work for).
HTH

How to disable ParseKit's default Parsers?

From Parsekit: how to match individual quote characters?
If you define a parser:
#start = int;
int = /[+-]?[0-9]+/
Unfortunately it isn't going to be parsing any integers prefixed with a "+", unless you include:
#numberState = "+" // at the top.
In the number parse above, the "Symbol" default parser wasn't even mentioned, yet it is still active and overrides user defined parsers.
Okay so with numbers you can still fix it by adding the directive. What if you're trying to create a parser for "++"? I haven't found any directive that can make the following parser work.
#start = plusplus;
plusplus = "++";
The effects of default parsers on the user parser seems so arbitrary. Why can't I parse "++"?
Is it possible to just turn off default Parsers altogether? They seem to get in the way if I'm not doing something common.
Or maybe I've got it all wrong.
EDIT:
I've found a parser that would parse plus plus:
#start = plusplus;
plusplus = plus plus;
plus = "+";
I am guessing the answer is: the literal symbols defined in your parser cannot overlap between default parsers; It must be contained completely by at least once of them.
Developer of ParseKit here.
I have a few responses.
I think you'll find the ParseKit API highly elegant and sensible, the more you learn. Keep in mind that I'm not tooting my own horn by saying that. Although I built ParseKit, I did not design the ParseKit API. Rather, the design of ParseKit is based almost entirely on the designs found in Steven Metsker's Building Parsers In Java. I highly recommend you checkout the book if you want to deeply understand ParseKit. Plus it's a fantastic book about parsing in general.
You're confusing Tokenizer States with Parsers. They are two distinct things, but the details are more complex than I can answer here. Again, I recommend Metsker's book.
In the course of answering your question, I did find a small bug in ParseKit. Thanks! However, it was not affecting your outcome described above as you were not using the correct grammar to get the outcome it seems you were looking for. You'll need to update your source code from The Google Code Project now, or else my advice below will not work for you.
Now to answer your question.
I think you are looking for a grammar which both recognizes ++ as a single multi-char Symbol token and also recognizes numbers with leading + chars as explicitly-positive numbers rather than a + Symbol token followed by a Number token.
The correct grammar I believe you are looking for is something like this:
#symbols = '++'; // declare ++ as a multi-char symbol
#numberState = '+'; // allow explicitly-positive numbers
#start = (Number|Symbol)*;
Input like this:
++ +1 -2 + 3 ++
Will be tokenized like so:
[++, +1, -2, +, 3, ++]++/+1/-2/+/3/++^
Two reminders:
Again, you will need to update your source code now to see this work correctly. I had to fix a bug in this case.
This stuff is tricky, and I recommend reading Metsker's book to fully understand how ParseKit works.

antlr add syntactic predicate

For the following rule :
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* switchDefaultLabel? switchCaseLabel*)
;
I got an error:"rule switchBlockLabels has non-LL descision due to recursive rule invocations reachable from alts 1,2".And I tried to add syntactic predicate to solve this problem.I read the book "The Definitive ANTLR Reference".And Now I am confused that since there is no alternatives in rule switchBlockLabels,then no decision need to be made on which one to choose.
Is anyone can help me?
Whenever the tree parser stumbles upon, say, 2 switchCaseLabels (and no switchDefaultLabel in the middle), it does not know to which these switchCaseLabels belong. There are 3 possibilities the parser can choose from:
2 switchCaseLabels are matched by the 1st switchCaseLabel*;
2 switchCaseLabels are matched by the 2nd switchCaseLabel*;
1 switchCaseLabel is matched by the 1st switchCaseLabel*, and one by the 2nd switchCaseLabel*.
and since the parser does not like to choose for you, it emits an error.
You need to do something like this instead:
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* (switchDefaultLabel switchCaseLabel*)?)
;
That way, when there are only switchCaseLabels, and no switchDefaultLabel, these switchCaseLabels would be always matched by the first switchCaseLabel*: there is no ambiguity anymore.

Bison input analyzer - basic question on optional grammar and input interpretation

I am very new to Flex/Bison, So it is very navie question.
Pardon me if so. May look like homework question - but I need to implement project based on below concept.
My question is related to two parts,
Question 1
In Bison parser, How do I provide rules for optional input.
Like, I need to parse the statment
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana'
Here the ratio token can be optional. Similarly, If I have many tokens optional, then How do I provide the grammar in the parser for the same?
My code looks like,
%start program
program : TK_COUNTRY TK_IDENTIFIER TK_STATE TK_IDENTIFIER TK_POPULATION TK_IDENTIFIER ...
where all the tokens are defined in the lexer. Since there are many tokens which are optional, If I use "|" then there will be many different ways of input combination possible.
Question 2
There are good chance that the comment might have quotes as part of the input, so I have added a token -tag which user can provide to interpret the same,
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana$'s population' -tag=$
Now, I need to reinterpret Indiana$'s as Indiana's since -tag=$.
Please provide any input or related material for to understand these topic.
Q1: I am assuming we have 4 possible tokens: NAME , '-', '=' and VALUE
Then the grammar could look like this:
attrs:
attr attrs
| attr
;
attr:
'-' NAME '=' VALUE
;
Note that, unlike you make specific attribute names distinguished tokens, there is no way to say "We must have country, state and population, but ratio is optional."
This would be the task of that part of the program that analyses the data produced by the parser.
Q2: I understand this so, that you think of changing the way lexical analysis works while the parser is running. This is not a good idea, at least not for a beginner. Have you even started to think about lexical analysis, as opposed to parsing?