Grammar Ambiguous or Not Ambiguous? - grammar

I'm new to BNF yet, I've got a tutorial question to solve. Given below is the question.
'For each of the following grammars specify whether they are ambiguous or unambiguous'.
Grammar1:
<T> ::= <T> <Q> 0 | 22
<Q> ::= 2|3
Grammar2:
<first>::=<first><first><second>
<second>::=<third><second>
<third>::=a|b|c
Grammar3:
<A>::=<B><A><A>
<B>::=1|2|3|4
can somebody please help me to find the answers and describe in a way to easily understand that is a great help. so please.

To detect an ambiguity in the grammar, you need to exhibit a string that can be parsed two ways.
Finding such a string can be hard with a large grammar; in fact, it can be impossibly hard.
But you do this by hand by exploring various token sequences. This gets dull fast, and doesn't work in practice if the grammar is anything but trivial.
What you really want to do is build a tool that enumerates possible strings of characters and tries them to see if there is an ambiguity.
You can do this brute force by simply generating all strings, but this rapidly produces many strings that are simply unparseable and that doesn't help.
Or, you can generate strings using the grammar as a guide, ensuring that each extension to a proposed string produces something the grammar will still accept. This way all generated strings are valid, so at least you are producing valid junk.
You can do this a depth-first search across the grammar rules. You end up mechanizing the following process:
1. Pick a pair of rules with the same LHS.
2. Instantiate S1 with the RHS of the first rule, S2 with the RHS of the second.
3. Repeat until you are tired (hit some search depth):
a. if s1 == s2, you've found an ambiguity.
b. if s1 derives a terminal that s2 does not derive,
then s1 and s2 cannot be ambiguous.
c. Pick a nonterminal in s1 or s2.
If there is none, then if s1 <> s2, this path doesn't lead to an ambiguity: backtrack.
d. Replace the nonterminal with a valid RHS for that nonterminal.
e. Recurse to a.
4. If all branches of the search lead to non-ambiguous strings,
then this rule isn't ambiguous.
The DMS Software Reengineering Toolkit has a parser generator with this capability built in; we can simply try the grammars. I had to reformulate the grammars slightly to make them compatible with DMS, so I show the new version here:
Grammar1:
<T> ::= <T><Q> '0' ;
<T> ::= '2' '2' ;
<Q> ::= '2' ;
<Q> ::= '3' ;
DMS run on Grammar1:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar1.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 6 NRules = 4
*** LR(0) State Machine construction complete ***
States: 8
What next? ambiguities 10
Nonterminal <Q> is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
*** Search for ambiguities to depth 3...
*** Search for ambiguities to depth 4...
Nonterminal <T> is not ambiguous
*** All ambiguities in grammar detected ***
The tool reports that all nonterminals are not ambiguous.
So, Grammar1 is not ambiguous.
Grammar2:
<first> = <first><first><second> ;
<second> = <third><second> ;
<third> = 'a' ;
<third> = 'b' ;
<third> = 'c' ;
DMS run on Grammar2:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar2.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
*** LR(0) State Machine construction complete ***
Determining if machine is SLR(0), SLR(1) or LALR(1)...
States: 9
Detecting possible cycles...
*** Circular definition:
Rule 1: <first> = <first> <first> <second> ;
*** Circular definition:
Rule 2: <second> = <third> <second> ;
What next? ambiguities 10
Nonterminal <first> is circularly defined
Nonterminal <second> is circularly defined
Nonterminal <third> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***
This grammar has a problem OP didn't ask about:
the tokens <first> and <second> are ill-defined
("circularly defined" according to this tool).
It should be clear that <first> expands starting
with <first> but there is nothing provide to tell
us what <first> can be expand to as a concrete literal.
So the grammar isn't ambiguous... it is just outright
broken.
Grammar3:
<A> = <B><A><A> ;
<B> = '1' ;
<B> = '2' ;
<B> = '3' ;
<B> = '4' ;
DMS run on Grammar3:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar3.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
LR(0) State Machine Generation Phase.
*** LR(0) State Machine construction complete ***
States: 8
Detecting possible cycles...
*** Circular definition:
Rule 1: <A> = <B> <A> <A> ;
What next? ambiguities 10
Nonterminal <A> is circularly defined
Nonterminal <B> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***
This grammar is also broken in a way OP didn't discuss.
Here the problem is that we can find a substitution
for <A>, but it leads to an infinite expansion.
The grammar isn't ambiguous, but it only accepts
infinitely long strings, which are not useful in practice.
Now, none of these grammars are ambiguous in the sense the OP
actually wanted. Here I show a classic ambiguous grammar
based on if-then-else statements with dangling else:
Grammar4:
G = S ;
S = 'if' E 'then' S ;
S = 'if' E 'then' S 'else' S ;
S = V '=' E ;
DMS Run on Grammar4:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive ..\Tests\ifthenelse_ambiguous.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening ..\Tests\ifthenelse_ambiguous.bnf
<<<Rule Collection Completed>>>
NTokens = 9 NRules = 4
*** LR(0) State Machine construction complete ***
What next? ambiguities 10
Nonterminal G is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
Ambiguous Rules:
S = 'if' E 'then' S 'else' S ; SemanticCopy2
S = 'if' E 'then' S ; SemanticCopy2
Instance: < 'if' E 'then' 'if' E 'then' S 'else' S >
Derivation:
1: < S 'else' S >
< S >
2: < 'if' E 'then' S 'else' S >
< 'if' E 'then' S 'else' S >
*** All ambiguities in grammar detected ***
The search finds an ambiguous instance phrase for statement. If you
look at the instance phrase, you should see that there is one else
clause... and the grammar allows it to attach itself to either if-then statement.
You don't need a tool like this for really tiny grammars; you can do it by looking at the rules and working it out. But for a large grammar, this is hard, and that's where a tool like this is really useful.
Consider a run on a Java version 8 grammar, with over 400 rules:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive "C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf"
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf
<<<Rule Collection Completed>>>
NTokens = 243 NRules = 410
*** LR(0) State Machine construction complete ***
States: 774
What next? ambiguities 15
Nonterminal optional_CONTROL_Z is not ambiguous
Nonterminal package_name_declaration is not ambiguous
Nonterminal anonymous_class_creation is not ambiguous
Nonterminal annotation_type_declaration is not ambiguous
Nonterminal annotation_interface_header is not ambiguous
Nonterminal default_value is not ambiguous
Nonterminal field_declaration is not ambiguous
Nonterminal member_value is not ambiguous
Nonterminal marker_annotation is not ambiguous
Nonterminal single_member_annotation is not ambiguous
Nonterminal enum_body is not ambiguous
Nonterminal type_parameters is not ambiguous
Nonterminal modifier is not ambiguous
Nonterminal local_variable_declaration is not ambiguous
Nonterminal vararg_parameter is not ambiguous
Nonterminal variable_declarator_id is not ambiguous
Nonterminal variable_initializer is not ambiguous
Nonterminal primitive_type is not ambiguous
Nonterminal try_resource_list_opt is not ambiguous
Nonterminal catch_statements_opt is not ambiguous
Nonterminal finally_statement_opt is not ambiguous
Nonterminal finally_statement is not ambiguous
Nonterminal switch_group is not ambiguous
Nonterminal switch_label is not ambiguous
Nonterminal catch_statement is not ambiguous
Nonterminal catch_parameter is not ambiguous
Nonterminal literal is not ambiguous
Nonterminal array_dims is not ambiguous
Nonterminal array_creation_with_initialization is not ambiguous
Nonterminal dim_spec is not ambiguous
Nonterminal superpath is not ambiguous
Nonterminal thispath is not ambiguous
Nonterminal target is not ambiguous
Nonterminal unary_expression is not ambiguous
Nonterminal lambda_body is not ambiguous
Nonterminal right_angle is not ambiguous
*** Search for ambiguities to depth 1...
Nonterminal type_declarations is not ambiguous
Nonterminal annotations_opt is not ambiguous
Nonterminal modifiers is not ambiguous
Nonterminal brackets is not ambiguous
Nonterminal switch_groups is not ambiguous
Nonterminal bounds_list is not ambiguous
*** Search for ambiguities to depth 2...
Nonterminal class_body is not ambiguous
Nonterminal arguments is not ambiguous
Nonterminal annotation_type_body is not ambiguous
Nonterminal qualified_name is not ambiguous
Nonterminal interface_body is not ambiguous
Nonterminal enum_class_header is not ambiguous
Nonterminal enum_class_body_opt is not ambiguous
Nonterminal block is not ambiguous
Ambiguous Rules:
executable_statement = 'if' '(' expression ')' executable_statement 'else' executable_statement ; SemanticCopy2
executable_statement = 'if' '(' expression ')' executable_statement ; SemanticCopy2
Instance: < 'if' '(' expression ')' 'if' '(' expression ')' executable_statement 'else' executable_statement >
Derivation:
1: < executable_statement 'else' executable_statement >
< executable_statement >
2: < 'if' '(' expression ')' executable_statement 'else' executable_statement >
< 'if' '(' expression ')' executable_statement 'else' executable_statement >
Nonterminal variable_declarator is not ambiguous
Nonterminal try_resource_list is not ambiguous
*** Search for ambiguities to depth 3...
Nonterminal type_arguments is not ambiguous
Nonterminal member_value_pair is not ambiguous
*** Search for ambiguities to depth 4...
Nonterminal array_creation_no_initialization is not ambiguous
Nonterminal array_creation_with_initialization_header is not ambiguous
*** Search for ambiguities to depth 5...
Nonterminal compilation_unit is not ambiguous
Nonterminal nested_class_declaration is not ambiguous
Nonterminal interface_header is not ambiguous
*** Search for ambiguities to depth 6...
Nonterminal name is not ambiguous
Nonterminal normal_annotation is not ambiguous
Nonterminal type_parameter is not ambiguous
*** Search for ambiguities to depth 7...
Nonterminal enum_constant is not ambiguous
Nonterminal bound is not ambiguous
*** Search for ambiguities to depth 8...
Nonterminal annotation is not ambiguous
Nonterminal type is not ambiguous
Nonterminal catch_statements is not ambiguous
Nonterminal value_suffix is not ambiguous
Ambiguous Rules:
method_reference = type '::' type_arguments IDENTIFIER ; SemanticCopy2
method_reference = primary '::' type_arguments IDENTIFIER ; SemanticCopy2
Instance: < IDENTIFIER '::' type_arguments IDENTIFIER >
Derivation:
1: < type '::' type_arguments IDENTIFIER >
< primary '::' type_arguments IDENTIFIER >
2: < type >
< primary >
3: < name brackets >
< primary >
4: < annotations_opt IDENTIFIER type_arguments brackets >
< primary >
5: < IDENTIFIER type_arguments brackets >
< primary >
6: < IDENTIFIER type_arguments brackets >
< primary_not_new_array >
7: < IDENTIFIER type_arguments brackets >
< IDENTIFIER >
8: < type_arguments brackets >
< >
*** Search for ambiguities to depth 9...
Nonterminal enum_constants is not ambiguous
Nonterminal type_argument is not ambiguous
*** Search for ambiguities to depth 10...
Nonterminal parameter is not ambiguous
*** Search for ambiguities to depth 11...
Nonterminal class_header is not ambiguous
Nonterminal nested_interface_declaration is not ambiguous
*** Search for ambiguities to depth 12...
Nonterminal import_statement is not ambiguous
Nonterminal type_declaration is not ambiguous
Nonterminal name_list is not ambiguous
Nonterminal variable_declarator_list is not ambiguous
Nonterminal formal_name_list is not ambiguous
*** Search for ambiguities to depth 13...
*** Search for ambiguities to depth 14...
Nonterminal method_declaration is not ambiguous
This takes about 5 minutes to run because it is computing an exponentially growing set of instance strings. But we learn:
1) Java has the dangling else problem, too! (In our parsers we handle
this by "prefer shift on 'else'" rule, which this ambiguity detector
doesn't know about.
2) The grammar rule for method_reference is ambiguous. I think
it is this way in the actual Java standard, too. This is actually
handled in our parsers in the name resolver, by looking at the type
of the IDENTIFIER.
Easy to talk about a tool like this but its a lot trickier to code it and have it handle big grammars. I've run a 3000 rule COBOL grammar through our tool and had it check some 480 billion different string expansions. Still don't know if the whole grammar is ambiguous or not. (It did catch silly stuff which we fixed).

Related

Rascal: Grammar to Parse BNF

I want to write a concrete grammar to parse BNF-like syntax definitions.
Looking at the EXP Concrete Syntax recipe I created this very simple first version:
module BNFParser
lexical Identifier = [a-z]+ ;
syntax GrammarRule = left RuleHead ":" RuleCase* ";" ;
syntax RuleHead = Identifier ;
syntax RuleCase = Identifier ;
and invoked it in the Repl like this:
import BNFParser;
import ParseTree;
parse(#GrammarRule, "foo : bar baz ;");
But this results in a rather arcane error message:
|std:///ParseTree.rsc|(13035,1963,<393,0>,<439,114>): ParseError(|unknown:///|(3,1,<1,3>,<1,4>))
at *** somewhere ***(|std:///ParseTree.rsc|(13035,1963,<393,0>,<439,114>))
at parse(|std:///ParseTree.rsc|(14991,5,<439,107>,<439,112>))
ok
I also tried using the start keyword ahead of GrammarRule, but that didn't help. What am I doing wrong?
lexical Identifier = [a-z]+ !>> [a-z];
That helps for ambiguous lists of identifiers. The additional !>> constraint declares that identifiers are only acceptable if no further characters can be consumed.
Also this is required for fixing the parse error:
layout Whitespace = [\ \n\r]*;
For all syntax rules in scope it will intermix this nonterminal between all symbols. It leaves the lexical rules alone.

CFG: Why is this grammar ambiguous?

The grammar is as follows.
S -> SS' | a | b
S' -> a | b
The way I understand it, derivations from this grammar will be like SS'S'S'S'... (0 or more S'), where each S or S' will generate a or b.
Can someone provide an example that shows this grammar is ambiguous? (The solution says it is.)
It isn't ambiguous. Your analysis is correct.
Here's a mechanical check of your grammar (reshaped for our tool):
S = S Sprime ;
S = a ;
S = b ;
Sprime = a ;
Sprime = b ;
Execution of tool:
C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator>run ParserGenerator.P0B -interactive C:\
DMS GLR Parser Generator 2.4.1
Copyright (C) 1997-2018 Semantic Designs, Inc.
Opening C:\temp\Example.bnf
*** EOF seen
<<<Rule Collection Completed>>>
NTokens = 5 NRules = 5
LR(1) Parser Generator -- Find Follow and SLR Lookahead sets
Computing MemberSets for Nonterminal Tokens...
What next? ambiguities 100
Print results where (<CR> defaults to console)?
Default paper width: 80
How wide should the printout be (<CR> selects default)?
*** Search for ambiguities to depth 100
Nonterminal < Sprime > is not ambiguous
*** Search for ambiguities to depth 1; trying 2 rule pairs...
*** Search for ambiguities to depth 2; trying 2 rule pairs...
*** Search for ambiguities to depth 3; trying 2 rule pairs...
*** Search for ambiguities to depth 4; trying 2 rule pairs...
Nonterminal < S > is not ambiguous [modulo rule derivation loops]
*** 0 ambiguities found ***
*** All ambiguities in grammar detected ***
This tool is rather overkill for grammar with two nonterminals. But when somebody gives a set of 200 nonterminals it is much harder to do by hand.
(For theorists: this tool obviously can't decide this for all grammars.
It uses a recursive iterative deepening search in the space of nonterminal expansions to look of duplicate/ambiguous expansions. That's works pretty well in pratice).

shift/reduce error in yacc

I know this part of my grammar cause error but I don't know how to fix it I even use %left and right but it didn't help. Can anybody please help me to find out what is the problem with this grammar.
Thanks in advance for your help.
%token VARIABLE NUM
%right '='
%left '+' '-'
%left '*' '/'
%left '^'
%start S_PROOP
EQUATION_SEQUENCE
: FORMULA '=' EQUATION
;
EQUATION
: FORMULA
| FORMULA '=' EQUATION
;
FORMULA
: SUM EXPRESSION
| PRODUCT EXPRESSION
| EXPRESSION '+' EXPRESSION
| EXPRESSION '*' EXPRESSION
| EXPRESSION '/' EXPRESSION
| EXPRESSION '^' EXPRESSION
| EXPRESSION '-' EXPRESSION
| EXPRESSION
;
EXPRESSION
: EXPRESSION EXPRESSION
| '(' EXPRESSION ')'
| NUM
| VARIABLE
;
Normal style is to use lower case for non-terminals and upper case for terminals; using upper case indiscriminately makes your grammar harder to read (at least for those of us used to normal yacc/bison style). So I've written this answer without so much recourse to the caps lock key.
The basic issue is the production
expression: expression expression
which is obviously ambiguous, since it does not provide any indication of associativity. In that, it is not different from
expression: expression '+' expression
but that conflict can be resolved using a precedence declaration:
%left '+'
The difference is that the first production does not have any terminal symbol, which makes it impossible to use precedence rules to disambiguate: in yacc/bison, precedence is always a comparison between a potential reduction and a potential shift. The potential reduction is some production which could be reduced; the potential shift is a terminal symbol which might be able to extend some production. Since the potential shift must be a terminal symbol, that is what is used in the precedence declaration; by default, the precedence of the potential reduction is defined by the last terminal symbol in the right-hand side but it is possible to specify a different terminal using a %prec marker. In any case, the precedence relation involves a terminal symbol, and if the grammar allows juxtaposition of two terminals, there is no relevant terminal symbol.
That's easy to work around, since you are under no obligation to use precedence to resolve conflicts. You could just avoid the conflict:
/* Left associative rule */
expr_sequence: expr | expr_sequence expr
/* Alternative: right associative rule */
expr_sequence: expr | expr expr_sequence
Since there is no indication what you intend by the juxtaposition, I'm unable to recommend one or the other of the above alternatives, but normally I would incline towards the first one.
That's not terribly different from your grammar for equation_sequence, although equation_sequence actually uses a terminal symbol so it could have been handled with a precedence declaration. It's worth noting that equation_sequence, as written, is right-associative. That's usually considered correct for assignment operators, (a = b = c + 3, in a language like C, is parsed as a = (b = c + 3) and not as (a = b) = c + 3, making assignment one of the few right-associative operators.) But if you are using = as an equality operator, it might not actually be what you intended.

ANTLR v4: Same character has different meaning in different contexts

This is my first crack at parser generators, and, consequently ANTLR. I'm using ANTLR v4 trying to generate a simple practice parser for Morse Code with the following extra rules:
A letter (e.g., ... [the letter 's']) can be denoted as capitalized if a '^' precedes it
ex.: ^... denotes a capital 'S'
Special characters can be embeded in parentheses
ex.: (#)
Each encoded entity will be separated by whitespace
So I could encode the following sentence:
ABC a#b.com
as (with corresponding letters shown underneath):
^.- ^-... ^-.-. ( ) ._ (#) -... (.) -.-. --- --
A B C ' ' a '#' b '.' c o m
Particularly note the two following entities: ( ) (which denotes a space) and (.) (which denotes a period.
There is mainly one things that I'm finding hard to wrap my head around: The same token can take on different meanings depending on whether it is in parentheses or not. That is, I want to tell ANTLR that I want to discard whitespace, yet not in the ( ) case. Also, a Morse Code character can consist of dots-and-dashes (periods-and-dashes), yet, I don't want to consider the period in (.) as "any charachter".
Here is the grammar I have got so far:
grammar MorseCode;
file: entity*;
entity:
special
| morse_char;
special: '(' SPECIAL ')';
morse_char: '^'? (DOT_OR_DASH)+;
SPECIAL : .; // match any character
DOT_OR_DASH : ('.' | '-');
WS : [ \t\r\n]+ -> skip; // we don't care about whitespace (or do we?)
When I try it against the following input:
^... --- ...(#)
I get the following output (from grun ... -tokens):
[#0,0:0='^',<1>,1:0]
[#1,1:1='.',<4>,1:1]
...
[#15,15:14='<EOF>',<-1>,1:15]
line 1:1 mismatched input '.' expecting DOT_OR_DASH
It seems there is trouble with ambiguity between SPECIAL and DOT_OR_DASH?
It seems like your (#) syntax behaves like a quoted string in other programming languages. I would start by defining SPECIAL as:
SPECIAL : '(' .*? ')';
To ensure that . . and .. are actually different, you can use this:
SYMBOL : [.-]+;
Then you can define your ^ operator:
CARET : '^';
With these three tokens (and leaving WS as-is), you can simplify your parser rules significantly:
file
: entity* EOF
;
entity
: morse_char
| SPECIAL
;
morse_char
: CARET? SYMBOL
;

How to solve a shift/reduce conflict?

I'm using CUP to create a parser that I need for my thesis. I have a shift/reduce conflict in my grammar. I have this production rule:
command ::= IDENTIFIER | IDENTIFIER LPAREN parlist RPAREN;
and I have this warning:
Warning : *** Shift/Reduce conflict found in state #3
between command ::= IDENTIFIER (*)
and command ::= IDENTIFIER (*) LPAREN parlist RPAREN
under symbol LPAREN
Now, I actually wanted it to shift so I'm pretty ok with it, but my professor told me to find a way to solve the conflict. I'm blind. I've always read about the if/else conflict but to me this doesn't seem the case.
Can you help me?
P.S.: IDENTIFIER, LPAREN "(" and RPAREN ")" are terminal, parlist and command are not.
Your problem is not in those rules at all. Although Michael Mrozek answer is correct approach to resolving the "dangling else problem", it does not grasp the problem at hand.
If you look at the error message, you see that the shift / reduce conflict is present when lexing LPAREN. I am pretty sure that the rules alone will not create a conflict.
I can't see your grammar, so I can't help you. But your conflict is probably when a command is followed by a different rule that start with a LPAREN.
Look at any other rules that can potentially be after command and start with LPAREN. You will then have to consolidate the rules. There is a very good chance that your grammar is erroneous for a specific input.
You have two productions:
command ::= IDENTIFIER
command ::= IDENTIFIER LPAREN parlist RPAREN;
It's a shift/reduce conflict when the input tokens are IDENTIFIER LPAREN, because:
LPAREN could be the start of a new production you haven't listed, in which case the parser should reduce the IDENTIFIER already on the stack into command, and have command LPAREN remaining
They could both be the start of the second production, so it should shift the LPAREN onto the stack next to IDENTIFIER and keep reading, trying to find a parlist.
You can fix it by doing something like this:
command ::= IDENTIFIER command2
command2 ::= LPAREN parlist RPAREN |;
Try to set a precedence:
precedence left LPAREN, RPARENT;
It forces CUP to decide the conflict, taking the left match.