CFG: Why is this grammar ambiguous? - grammar

The grammar is as follows.
S -> SS' | a | b
S' -> a | b
The way I understand it, derivations from this grammar will be like SS'S'S'S'... (0 or more S'), where each S or S' will generate a or b.
Can someone provide an example that shows this grammar is ambiguous? (The solution says it is.)

It isn't ambiguous. Your analysis is correct.
Here's a mechanical check of your grammar (reshaped for our tool):
S = S Sprime ;
S = a ;
S = b ;
Sprime = a ;
Sprime = b ;
Execution of tool:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator>run ParserGenerator.P0B -interactive C:\
DMS GLR Parser Generator 2.4.1
Copyright (C) 1997-2018 Semantic Designs, Inc.
Opening C:\temp\Example.bnf
*** EOF seen
<<<Rule Collection Completed>>>
NTokens = 5 NRules = 5
LR(1) Parser Generator -- Find Follow and SLR Lookahead sets
Computing MemberSets for Nonterminal Tokens...
What next? ambiguities 100
Print results where (<CR> defaults to console)?
Default paper width: 80
How wide should the printout be (<CR> selects default)?
*** Search for ambiguities to depth 100
Nonterminal < Sprime > is not ambiguous
*** Search for ambiguities to depth 1; trying 2 rule pairs...
*** Search for ambiguities to depth 2; trying 2 rule pairs...
*** Search for ambiguities to depth 3; trying 2 rule pairs...
*** Search for ambiguities to depth 4; trying 2 rule pairs...
Nonterminal < S > is not ambiguous [modulo rule derivation loops]
*** 0 ambiguities found ***
*** All ambiguities in grammar detected ***
This tool is rather overkill for grammar with two nonterminals. But when somebody gives a set of 200 nonterminals it is much harder to do by hand.
(For theorists: this tool obviously can't decide this for all grammars.
It uses a recursive iterative deepening search in the space of nonterminal expansions to look of duplicate/ambiguous expansions. That's works pretty well in pratice).

Related

What are terminal and nonterminal symbols?

I am reading Rebol Wikipedia page.
"Parse expressions are written in the parse dialect, which, like the do dialect, is an expression-oriented sublanguage of the data exchange dialect. Unlike the do dialect, the parse dialect uses keywords representing operators and the most important nonterminals"
Can you explain what are terminals and nonterminals? I have read a lot about grammars, but did not understand what they mean. Here is another link where this words are used very often.
Definitions of terminal and non-terminal symbols are not Parse-specific, but are concerned with grammars in general. Things like this wiki page or intro in Grune's book explain them quite well. OTOH, if you're interested in how Red Parse works and yearn for simple examples and guidance, I suggest to drop by our dedicated chat room.
"parsing" has slightly different meanings, but the one I prefer is conversion of linear structure (string of symbols, in a broad sense) to a hierarchical structure (derivation tree) via a formal recipe (grammar), or checking if a given string has a tree-like structure specified by a grammar (i.e. if "string" belongs to a "language").
All symbols in a string are terminals, in a sense that tree derivation "terminates" on them (i.e. they are leaves in a tree). Non-terminals, in turn, are a form of abstraction that is used in grammar rules - they group terminals and non-terminals together (i.e. they are nodes in a tree).
For example, in the following Parse grammar:
greeting: ['hi | 'hello | 'howdy]
person: [name surname]
name: ['john | 'jane]
surname: ['doe | 'smith]
sentence: [greeting person]
greeting, person, name, surname and sentence are non-terminals (because they never actually appear in the linear input sequence, only in grammar rules);
hi, hello, howdy with john, jane, doe and smith are terminals (because parser cannot "expand" them into a set of terminals and non-terminals as it does with non-terminals, hence it "terminates" by reaching the bottom).
>> parse [hi jane doe] sentence
== true
>> parse [howdy john smith] sentence
== true
>> parse [wazzup bubba ?] sentence
== false
As you can see, terminal and non-terminal are disjoint sets, i.e. a symbol can be either in one of them, but not in both; moreso, inside grammar rules, only non-terminals can be written on the left side.
One grammar can match different strings, and one string can be matched by different grammars (in the example above, it could be [greeting name surname], or [exclamation 2 noun], or even [some noun], provided that exclamation and noun non-terminals are defined).
And, as usual, one picture is worth a thousand words:
Hope that helps.
think of it like that
a digit can be 1-9
now i will tell you to write down on a page a digit.
so you know that you can write down 1,2,3,4,5,6,7,8,9
basically the nonterminal symbol is "digit"
and the terminals symbols are the 1,2,3,4,5,6,7,8,9
when i told you to write down on a page a digit you wrote down 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9
you didn't wrote down the word "digit" you wrote down the 1 or 2 or 3....
do you see where i'm going ?
let's try to make our own "rules"
let's "create" a nonterminal symbol we will call it "Olaf"
Olaf can be a dog (NOTE: dog is terminal)
Olaf can be a cat (NOTE: cat is terminal)
Olaf can be a digit (NOTE: digit is nonterminal)
Now i'm telling you that you can write down on a page an Olaf.
so that's mean that you can write down "dog"
you can also write down "cat"
you can also write down a digit so that's mean you can write down 1 or 2 or 3...
because digit is nonterminal symbol you dont write down "digit" you write down
the symbols that digit is referring to which is 1 or 2 or 3 etc...
in the end only terminals symbols are written on the "page"
one more thing i have to say is something that you may encounter one day, basically when you say "a nonterminal can be something".
there is a special term for that and that's basically called a "production rule"(can also be called a "production")
for example
Olaf can be "dog"
Olaf can be "cat"
Olaf can be digit
we got 3 productions here in other words we got here 3 definitions of Olaf
specifications of Programming languages use those ideas quite a lot when defining a syntax of a language

How can I show that this grammar is ambiguous?

I want to prove that this grammar is ambiguous, but I'm not sure how I am supposed to do that. Do I have to use parse trees?
S -> if E then S | if E then S else S | begin S L | print E
L -> end | ; S L
E -> i
You can show it is ambiguous if you can find a string that parses more than one way:
if i then ( if i then print i else print i ; )
if i then ( if i then print i ) else print i ;
This happens to be the classic "dangling else" ambiguity. Googling your tag(s), title & grammar gives other hits.
However, if you don't happen to guess at an ambiguous string then googling your tag(s) & title:
how can i prove that this grammar is ambiguous?
There is no easy method for proving a context-free grammar ambiguous -- in fact,
the question is undecidable, by reduction to the Post correspondence problem.
You can put the grammar into a parser generator which supports all context-free grammars, a context-free general parser generator. Generate the parser, then parse a string which you think is ambiguous and find out by looking at the output of the parser.
A context-free general parser generator generates parsers which produce all derivations in polynomial time. Examples of such parser generators include SDF2, Rascal, DMS, Elkhound, ART. There is also a backtracking version of yacc (btyacc) but I don't think it does it in polynomial time. Usually the output is encoded as a graph where alternative trees for sub-sentences are encoded with a nested set of alternative trees.

Difference between grammar rules

Say there are two grammar rules
Rule 1 B -> aB | cB
and
Rule 2 B -> Ba | Bc
I'm a bit confused as the difference of these two. Would rule 1's expression be (a+c)* ? Then what would Rule 2's expression be?
Both of those grammars yield the empty language since there is no non-recursive rule, so no sentence consisting only of terminals can be derived.
If you add the production B→ε, both grammars would yield the same language, equivalent to the regular expression (a+c)*. However, the parse trees produced by the parse would be quite different.

Grammar Ambiguous or Not Ambiguous?

I'm new to BNF yet, I've got a tutorial question to solve. Given below is the question.
'For each of the following grammars specify whether they are ambiguous or unambiguous'.
Grammar1:
<T> ::= <T> <Q> 0 | 22
<Q> ::= 2|3
Grammar2:
<first>::=<first><first><second>
<second>::=<third><second>
<third>::=a|b|c
Grammar3:
<A>::=<B><A><A>
<B>::=1|2|3|4
can somebody please help me to find the answers and describe in a way to easily understand that is a great help. so please.
To detect an ambiguity in the grammar, you need to exhibit a string that can be parsed two ways.
Finding such a string can be hard with a large grammar; in fact, it can be impossibly hard.
But you do this by hand by exploring various token sequences. This gets dull fast, and doesn't work in practice if the grammar is anything but trivial.
What you really want to do is build a tool that enumerates possible strings of characters and tries them to see if there is an ambiguity.
You can do this brute force by simply generating all strings, but this rapidly produces many strings that are simply unparseable and that doesn't help.
Or, you can generate strings using the grammar as a guide, ensuring that each extension to a proposed string produces something the grammar will still accept. This way all generated strings are valid, so at least you are producing valid junk.
You can do this a depth-first search across the grammar rules. You end up mechanizing the following process:
1. Pick a pair of rules with the same LHS.
2. Instantiate S1 with the RHS of the first rule, S2 with the RHS of the second.
3. Repeat until you are tired (hit some search depth):
a. if s1 == s2, you've found an ambiguity.
b. if s1 derives a terminal that s2 does not derive,
then s1 and s2 cannot be ambiguous.
c. Pick a nonterminal in s1 or s2.
If there is none, then if s1 <> s2, this path doesn't lead to an ambiguity: backtrack.
d. Replace the nonterminal with a valid RHS for that nonterminal.
e. Recurse to a.
4. If all branches of the search lead to non-ambiguous strings,
then this rule isn't ambiguous.
The DMS Software Reengineering Toolkit has a parser generator with this capability built in; we can simply try the grammars. I had to reformulate the grammars slightly to make them compatible with DMS, so I show the new version here:
Grammar1:
<T> ::= <T><Q> '0' ;
<T> ::= '2' '2' ;
<Q> ::= '2' ;
<Q> ::= '3' ;
DMS run on Grammar1:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar1.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 6 NRules = 4
*** LR(0) State Machine construction complete ***
States: 8
What next? ambiguities 10
Nonterminal <Q> is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
*** Search for ambiguities to depth 3...
*** Search for ambiguities to depth 4...
Nonterminal <T> is not ambiguous
*** All ambiguities in grammar detected ***
The tool reports that all nonterminals are not ambiguous.
So, Grammar1 is not ambiguous.
Grammar2:
<first> = <first><first><second> ;
<second> = <third><second> ;
<third> = 'a' ;
<third> = 'b' ;
<third> = 'c' ;
DMS run on Grammar2:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar2.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
*** LR(0) State Machine construction complete ***
Determining if machine is SLR(0), SLR(1) or LALR(1)...
States: 9
Detecting possible cycles...
*** Circular definition:
Rule 1: <first> = <first> <first> <second> ;
*** Circular definition:
Rule 2: <second> = <third> <second> ;
What next? ambiguities 10
Nonterminal <first> is circularly defined
Nonterminal <second> is circularly defined
Nonterminal <third> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***
This grammar has a problem OP didn't ask about:
the tokens <first> and <second> are ill-defined
("circularly defined" according to this tool).
It should be clear that <first> expands starting
with <first> but there is nothing provide to tell
us what <first> can be expand to as a concrete literal.
So the grammar isn't ambiguous... it is just outright
broken.
Grammar3:
<A> = <B><A><A> ;
<B> = '1' ;
<B> = '2' ;
<B> = '3' ;
<B> = '4' ;
DMS run on Grammar3:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar3.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
LR(0) State Machine Generation Phase.
*** LR(0) State Machine construction complete ***
States: 8
Detecting possible cycles...
*** Circular definition:
Rule 1: <A> = <B> <A> <A> ;
What next? ambiguities 10
Nonterminal <A> is circularly defined
Nonterminal <B> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***
This grammar is also broken in a way OP didn't discuss.
Here the problem is that we can find a substitution
for <A>, but it leads to an infinite expansion.
The grammar isn't ambiguous, but it only accepts
infinitely long strings, which are not useful in practice.
Now, none of these grammars are ambiguous in the sense the OP
actually wanted. Here I show a classic ambiguous grammar
based on if-then-else statements with dangling else:
Grammar4:
G = S ;
S = 'if' E 'then' S ;
S = 'if' E 'then' S 'else' S ;
S = V '=' E ;
DMS Run on Grammar4:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive ..\Tests\ifthenelse_ambiguous.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening ..\Tests\ifthenelse_ambiguous.bnf
<<<Rule Collection Completed>>>
NTokens = 9 NRules = 4
*** LR(0) State Machine construction complete ***
What next? ambiguities 10
Nonterminal G is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
Ambiguous Rules:
S = 'if' E 'then' S 'else' S ; SemanticCopy2
S = 'if' E 'then' S ; SemanticCopy2
Instance: < 'if' E 'then' 'if' E 'then' S 'else' S >
Derivation:
1: < S 'else' S >
< S >
2: < 'if' E 'then' S 'else' S >
< 'if' E 'then' S 'else' S >
*** All ambiguities in grammar detected ***
The search finds an ambiguous instance phrase for statement. If you
look at the instance phrase, you should see that there is one else
clause... and the grammar allows it to attach itself to either if-then statement.
You don't need a tool like this for really tiny grammars; you can do it by looking at the rules and working it out. But for a large grammar, this is hard, and that's where a tool like this is really useful.
Consider a run on a Java version 8 grammar, with over 400 rules:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive "C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf"
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf
<<<Rule Collection Completed>>>
NTokens = 243 NRules = 410
*** LR(0) State Machine construction complete ***
States: 774
What next? ambiguities 15
Nonterminal optional_CONTROL_Z is not ambiguous
Nonterminal package_name_declaration is not ambiguous
Nonterminal anonymous_class_creation is not ambiguous
Nonterminal annotation_type_declaration is not ambiguous
Nonterminal annotation_interface_header is not ambiguous
Nonterminal default_value is not ambiguous
Nonterminal field_declaration is not ambiguous
Nonterminal member_value is not ambiguous
Nonterminal marker_annotation is not ambiguous
Nonterminal single_member_annotation is not ambiguous
Nonterminal enum_body is not ambiguous
Nonterminal type_parameters is not ambiguous
Nonterminal modifier is not ambiguous
Nonterminal local_variable_declaration is not ambiguous
Nonterminal vararg_parameter is not ambiguous
Nonterminal variable_declarator_id is not ambiguous
Nonterminal variable_initializer is not ambiguous
Nonterminal primitive_type is not ambiguous
Nonterminal try_resource_list_opt is not ambiguous
Nonterminal catch_statements_opt is not ambiguous
Nonterminal finally_statement_opt is not ambiguous
Nonterminal finally_statement is not ambiguous
Nonterminal switch_group is not ambiguous
Nonterminal switch_label is not ambiguous
Nonterminal catch_statement is not ambiguous
Nonterminal catch_parameter is not ambiguous
Nonterminal literal is not ambiguous
Nonterminal array_dims is not ambiguous
Nonterminal array_creation_with_initialization is not ambiguous
Nonterminal dim_spec is not ambiguous
Nonterminal superpath is not ambiguous
Nonterminal thispath is not ambiguous
Nonterminal target is not ambiguous
Nonterminal unary_expression is not ambiguous
Nonterminal lambda_body is not ambiguous
Nonterminal right_angle is not ambiguous
*** Search for ambiguities to depth 1...
Nonterminal type_declarations is not ambiguous
Nonterminal annotations_opt is not ambiguous
Nonterminal modifiers is not ambiguous
Nonterminal brackets is not ambiguous
Nonterminal switch_groups is not ambiguous
Nonterminal bounds_list is not ambiguous
*** Search for ambiguities to depth 2...
Nonterminal class_body is not ambiguous
Nonterminal arguments is not ambiguous
Nonterminal annotation_type_body is not ambiguous
Nonterminal qualified_name is not ambiguous
Nonterminal interface_body is not ambiguous
Nonterminal enum_class_header is not ambiguous
Nonterminal enum_class_body_opt is not ambiguous
Nonterminal block is not ambiguous
Ambiguous Rules:
executable_statement = 'if' '(' expression ')' executable_statement 'else' executable_statement ; SemanticCopy2
executable_statement = 'if' '(' expression ')' executable_statement ; SemanticCopy2
Instance: < 'if' '(' expression ')' 'if' '(' expression ')' executable_statement 'else' executable_statement >
Derivation:
1: < executable_statement 'else' executable_statement >
< executable_statement >
2: < 'if' '(' expression ')' executable_statement 'else' executable_statement >
< 'if' '(' expression ')' executable_statement 'else' executable_statement >
Nonterminal variable_declarator is not ambiguous
Nonterminal try_resource_list is not ambiguous
*** Search for ambiguities to depth 3...
Nonterminal type_arguments is not ambiguous
Nonterminal member_value_pair is not ambiguous
*** Search for ambiguities to depth 4...
Nonterminal array_creation_no_initialization is not ambiguous
Nonterminal array_creation_with_initialization_header is not ambiguous
*** Search for ambiguities to depth 5...
Nonterminal compilation_unit is not ambiguous
Nonterminal nested_class_declaration is not ambiguous
Nonterminal interface_header is not ambiguous
*** Search for ambiguities to depth 6...
Nonterminal name is not ambiguous
Nonterminal normal_annotation is not ambiguous
Nonterminal type_parameter is not ambiguous
*** Search for ambiguities to depth 7...
Nonterminal enum_constant is not ambiguous
Nonterminal bound is not ambiguous
*** Search for ambiguities to depth 8...
Nonterminal annotation is not ambiguous
Nonterminal type is not ambiguous
Nonterminal catch_statements is not ambiguous
Nonterminal value_suffix is not ambiguous
Ambiguous Rules:
method_reference = type '::' type_arguments IDENTIFIER ; SemanticCopy2
method_reference = primary '::' type_arguments IDENTIFIER ; SemanticCopy2
Instance: < IDENTIFIER '::' type_arguments IDENTIFIER >
Derivation:
1: < type '::' type_arguments IDENTIFIER >
< primary '::' type_arguments IDENTIFIER >
2: < type >
< primary >
3: < name brackets >
< primary >
4: < annotations_opt IDENTIFIER type_arguments brackets >
< primary >
5: < IDENTIFIER type_arguments brackets >
< primary >
6: < IDENTIFIER type_arguments brackets >
< primary_not_new_array >
7: < IDENTIFIER type_arguments brackets >
< IDENTIFIER >
8: < type_arguments brackets >
< >
*** Search for ambiguities to depth 9...
Nonterminal enum_constants is not ambiguous
Nonterminal type_argument is not ambiguous
*** Search for ambiguities to depth 10...
Nonterminal parameter is not ambiguous
*** Search for ambiguities to depth 11...
Nonterminal class_header is not ambiguous
Nonterminal nested_interface_declaration is not ambiguous
*** Search for ambiguities to depth 12...
Nonterminal import_statement is not ambiguous
Nonterminal type_declaration is not ambiguous
Nonterminal name_list is not ambiguous
Nonterminal variable_declarator_list is not ambiguous
Nonterminal formal_name_list is not ambiguous
*** Search for ambiguities to depth 13...
*** Search for ambiguities to depth 14...
Nonterminal method_declaration is not ambiguous
This takes about 5 minutes to run because it is computing an exponentially growing set of instance strings. But we learn:
1) Java has the dangling else problem, too! (In our parsers we handle
this by "prefer shift on 'else'" rule, which this ambiguity detector
doesn't know about.
2) The grammar rule for method_reference is ambiguous. I think
it is this way in the actual Java standard, too. This is actually
handled in our parsers in the name resolver, by looking at the type
of the IDENTIFIER.
Easy to talk about a tool like this but its a lot trickier to code it and have it handle big grammars. I've run a 3000 rule COBOL grammar through our tool and had it check some 480 billion different string expansions. Still don't know if the whole grammar is ambiguous or not. (It did catch silly stuff which we fixed).

How to read alternates in EBNF grammars

I have an EBNF grammar that has a few rules with this pattern:
sequence ::=
item
| item extra* sequence
Is the above equivalent to the following?
sequence ::=
item (extra* sequence)*
Edit
Due to some of you observing bugs or ambiguities in both sequences, I'll give a specific example. The SVG specification provides a grammar for path data. This grammar has several producers with this pattern:
lineto-argument-sequence:
coordinate-pair
| coordinate-pair comma-wsp? lineto-argument-sequence
Could the above be rewritten as the following?
lineto-argument-sequence:
coordinate-pair (comma-wsp? lineto-argument-sequence)*
Not really, they seem to have different bugs. The first sequence is ambiguous around "item" seeing that "extra" is optional. You could rewrite it as the following to remove ambiguity:
sequence3 ::=
item extra* sequence3
The second one is ambigous around "extra", seeing as it is basically two nested loops both starting with "extra". You could rewrite it as the following to remove ambiguity:
sequence4 ::=
item ((extra|item))*
Your first version will likely choke on an input sequence consisting of a single "item" (it depends on the parser implementation) because it won't disambiguate.
My rewrites assume you want to match a sequence starting with "item" and optionally followed by a series of (0 or more) "item" or "extra" in any order.
e.g.
item
item extra
item extra item
item extra extra item
item item item item
item item item item extra
etc.
Without additional information I would be personally inclined towards the option I labled "sequence4" as all the other options are merely using recursion as an expensive loop construct. If you are willing to give me more information I may be able to give a better answer.
EDIT: based on Jorn's excellent observation (with a small mod).
If you rewrite "sequence3" to remove recursion you get the following:
sequence5 ::=
(item extra*)+
It think this will be my prefered version, not "sequence4".
I have to point out that all three versions above are functionally equivalent (as recognizers or generators). The parse trees for 3 would be different to 4 and 5, but I cannot think that that would affect anything other than perhaps performance.
EDIT:
Concerning the following:
lineto-argument-sequence:
coordinate-pair
| coordinate-pair comma-wsp? lineto-argument-sequence
What this production says is that a lineto-argument-sequence is composed of at least one coordinate-pair followed by zero or more coordinate-pairs seperated by optional white/comma. Any of the following would constitute a lineto-argument-sequence (read -> as 'becomes'):
1,2 -> (1, 2)
1.5.6 -> (1.5, 0.6)
1.5.06 -> (1.5, 0.06)
2 3 3 4 -> (2,3) (3,4)
2,3-3-4 -> (2,3) (-3,-4)
2 3 3 -> ERROR
So a coordinate-pair is really any 2 consecutive numbers.
I have mocked up a grammar in ANTLR that seems to work. Note the pattern used for lineto_argument_sequence is similar to the one Jorn and I recommended previously.
grammar SVG;
lineto_argument_sequence
: coordinate_pair (COMMA_WSP? coordinate_pair)*
;
coordinate_pair
: coordinate COMMA_WSP? coordinate
;
coordinate
: NUMBER
;
COMMA_WSP
: ( WS+|WS*','WS*) //{ $channel=HIDDEN; }
;
NUMBER
: '-'? (INT | FLOAT) ;
fragment
INT
: '0'..'9'+ ;
fragment
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
fragment
WS : ' ' | '\t' | '\r' | '\n' ;
fragment
EXPONENT
: ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Given the following input:
2, 3 -3 -4 5.5.65.5.6
it produces this parse tree.
alt text http://www.freeimagehosting.net/uploads/85fc77bc3c.png
This rule would also be equivalent to sequence ::= (item extra*)*, thus removing the recursion on sequence.
Yes, those two grammars describe the same language.
But is that really EBNF? Wikipedia article on EBNF does not include the Kleene star operator.