Can someone help me understand this grammar? - grammar

I can't understand these grammar rules, and what the "returns" and "current" keywords means ?
WhereEntry returns WhereEntry:
AndWhereEntry ({OrWhereEntry.entries+=current}
("OR" entries+=AndWhereEntry)+)?
;

returns means: the result of the rule will be of type WhereEntry.
current means: the object parsed so far.
In sum this will mean
AndWhereEntry is a Subclass of WhereEntry.
if there is an OR then the AndWhereEntry parsed before will
be added to the OrWhereEntry entries list.

Related

How to factorize a string to check its belonging to language that is generated from alphabet?

Let S= {a, bb, bab, abaab} is an alphabet. and kleene closure will be S* will all possible combinations.
Is string abaabbabbaab exists in S*?
what is the method to factorize to check whether it is in S* or not?
I have done it, by the following ways,
Possible factorization:
(abaab)(bab)(b)(a)(a)(b)
(abaab)(bab)(b)(aa)(b)
(abaab)(bab)(ba)(ab)
(abaab)(bab)(baa)(b)
(abaab)(bab)(b)(aab)
we can see that (abaab)(bab) is matching , but later part is not matching will combinations in S*. I have factorized the later part in many ways, but still its not matching.
I want to ask that,
is it correct?
Is this correct way to factorize(tokenize) the string?
are all factorization pairs are correct?
is this correct method to check a string whether it is belong to a
language or not?
Some of your factoriztions contain $(b)$, which is not in $S$. So they are not correct.
I think your method is exhaustive trial and error. If you do that correctly, it is a correct way to find a factorization. For checking membership of a language, it works if the language is given in the form of the Kleene closure of a finite language.

RRULE (rfc 5545) until and count

I'm having trouble understanding the rfc5545 concerning the the until and count. From what I understand, UNTIL and COUNT cannot be in the same recur rule according to this part of the RFC:
Value Name: RECUR
Purpose: This value type is used to identify properties that
contain a recurrence rule specification.
Formal Definition: The value type is defined by the following
notation:
recur = "FREQ"=freq *(
; either UNTIL or COUNT may appear in a 'recur',
; but UNTIL and COUNT MUST NOT occur in the same 'recur'
...
Further in the rfc, this is stated:
If multiple BYxxx rule parts are specified, then after evaluating the
specified FREQ and INTERVAL rule parts, the BYxxx rule parts are
applied to the current set of evaluated occurrences in the following
order: BYMONTH, BYWEEKNO, BYYEARDAY, BYMONTHDAY, BYDAY, BYHOUR,
BYMINUTE, BYSECOND and BYSETPOS; then COUNT and UNTIL are evaluated.
This last paragraph seems to imply that the COUNT and UNTIL can be in the same RRULE.
When I check libraries that implement rrule generator and parsing, there is no validation that make sure that the the COUNT and UNTIL are not in the same recur.
What is the general implementation that everyone usually do with this ? Should we ignore this validation and simply use the UNTIL parameter when there is both COUNT and UNTIL (or vice versa) ? What does the RFC mean exactly concerning the COUNT and UNTIL parameter ?
I don't think you can derive from the second paragraph that having both is valid.
There is only one definition of RECUR and the cardinality of its various components: the ABNF definition. This is where you should go to check the validity of your property.
The second paragraph simply describes the algorithm to use for doing RRULE expansion.

Is there a way to get the number of tokens in an ANTLR4 parser rule?

In ANTLR4, it seems that predicates can only be placed at the front of sub-rules in order for them to cause the sub-rule to be skipped. In my grammar, some predicates depend on a token that appears near the end of the sub-rule, with one or more rule invocations in front of it. For example:
date :
{isYear(_input.LT(3).getText())}?
month day=INTEGER year=INTEGER { ... }
In this particular example, I know that month is always one single token, so it is always Token 3 that needs to be checked by isYear(). In general, though, I won't know the number of tokens making up a rule like month until runtime. Is there a way to get its token count?
There is no built-in way to get the length of the rule programmatically. You could use the documentation for ATNState in combination with the _ATN field in your parser to calculate all paths through a rule - if all paths through the rule contain the same number of tokens the you have calculated the exact number of tokens used by the rule.

antlr add syntactic predicate

For the following rule :
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* switchDefaultLabel? switchCaseLabel*)
;
I got an error:"rule switchBlockLabels has non-LL descision due to recursive rule invocations reachable from alts 1,2".And I tried to add syntactic predicate to solve this problem.I read the book "The Definitive ANTLR Reference".And Now I am confused that since there is no alternatives in rule switchBlockLabels,then no decision need to be made on which one to choose.
Is anyone can help me?
Whenever the tree parser stumbles upon, say, 2 switchCaseLabels (and no switchDefaultLabel in the middle), it does not know to which these switchCaseLabels belong. There are 3 possibilities the parser can choose from:
2 switchCaseLabels are matched by the 1st switchCaseLabel*;
2 switchCaseLabels are matched by the 2nd switchCaseLabel*;
1 switchCaseLabel is matched by the 1st switchCaseLabel*, and one by the 2nd switchCaseLabel*.
and since the parser does not like to choose for you, it emits an error.
You need to do something like this instead:
switchBlockLabels
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel* (switchDefaultLabel switchCaseLabel*)?)
;
That way, when there are only switchCaseLabels, and no switchDefaultLabel, these switchCaseLabels would be always matched by the first switchCaseLabel*: there is no ambiguity anymore.

ANTLR lexing getting confused over '...' and floats

I think the ANTLR lexer is treating my attempt at a range expression "1...3" as a float. The expression "x={1...3}" is coming out of the lexer as "x={.3}" when I used the following token definitions:
FLOAT
: ('0'..'9')+ ('.' '0'..'9'+)? EXPONENT?
| ('.' '0'..'9')+ EXPONENT?
;
AUTO : '...';
When I change FLOAT to just check for integers, as so:
FLOAT : ('0'..'9')+;
then the expression "x={1...3}" is tokenized correctly. Can anyone help me to fix this?
Thanks!
I think the lexer is putting your first period into the FLOAT token and then the remain two periods do not make your AUTO token. You will need a predicate to determine if the period should be part of a float or auto token.
So why are you using three periods instead of two, must languages use two periods for a "range" and the language should determine if the period is part of a float or the range based on the following "character".
You probably need to be looking into the Defiitive ANTLR Reference on how to build your predicate for the different rules.
Hope this helps you find the correct way to complete the task.
WayneH hits on your problem. You've allowed floats in the format ".3" (without a leading 0). So, the lexer identifies the last . and the 3 and considers it a floating point number. As a result it doesn't see three dots. It sees two dots and a float.
It's very common for languages to disallow this format for floats and require that there be at least one digit (even if it's a 0) to the left of the decimal. I believe that change to your grammar would fix your problem.
There probably is a way to fix it with a predicate, but I've not yet spent enough time with ANTLR to see an obvious way to do so.
For anyone wanting to do this...
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
I can just change the language syntax to replace the "..." with a "to" keyword.