How can I show that this grammar is ambiguous? - grammar

I want to prove that this grammar is ambiguous, but I'm not sure how I am supposed to do that. Do I have to use parse trees?
S -> if E then S | if E then S else S | begin S L | print E
L -> end | ; S L
E -> i

You can show it is ambiguous if you can find a string that parses more than one way:
if i then ( if i then print i else print i ; )
if i then ( if i then print i ) else print i ;
This happens to be the classic "dangling else" ambiguity. Googling your tag(s), title & grammar gives other hits.
However, if you don't happen to guess at an ambiguous string then googling your tag(s) & title:
how can i prove that this grammar is ambiguous?
There is no easy method for proving a context-free grammar ambiguous -- in fact,
the question is undecidable, by reduction to the Post correspondence problem.

You can put the grammar into a parser generator which supports all context-free grammars, a context-free general parser generator. Generate the parser, then parse a string which you think is ambiguous and find out by looking at the output of the parser.
A context-free general parser generator generates parsers which produce all derivations in polynomial time. Examples of such parser generators include SDF2, Rascal, DMS, Elkhound, ART. There is also a backtracking version of yacc (btyacc) but I don't think it does it in polynomial time. Usually the output is encoded as a graph where alternative trees for sub-sentences are encoded with a nested set of alternative trees.

Related

Antlr 4 get (print) all parse trees when there is ambiguity

Consider the following ANTLR 4 grammar:
grammar Test;
start: e EOF;
e : e '+' e #op
| NUMBER #atom
;
NUMBER: [0-9]+;
Based on the disambiguation rules of ANTLR, in this case binary operators being left associative, the result of parsing the string 1+2+3 is ((1+2)+3). But there is another parse tree possible, namely (1+(2+3)), if you don't consider the ANTLR's default disambiguation rule.
Is there a way to get both parse trees in ANTLR? Or at least enabling a flag or something, so that it tells me that there was another parse tree and possibly print it?
Update
I understand that in ANTLR, this grammar is unambiguous, because binary operators are always left-associative, but I couldn't come up with another example. My whole point is that I'd like to get an warning (or something similar) whenever ANTLR tries to resolve the ambiguity. For example, in good old Yacc (Bison), if I have the same grammar:
s : e
;
e : e '+' e
| NUMBER
;
when generating the parser, I get the warning State 6 conflicts: 1 shift/reduce.
There's no ambiquity in this small grammar. There are 2 alts in e each with a definitive path. An ambiquity would be something like this:
e = a b | a c;
where a parser needs some lookahead to determine which path to take. But back to your parse tree question. What you want is to define a different associativity. Normally, all operators are left-associative by default leading to this parse tree:
defining the operator + to be right-associative like so:
grammar Example;
start: e EOF;
e : <assoc=right> e '+' e #op
| NUMBER #atom
;
NUMBER: [0-9]+;
leads to:
Update
In order to get notified whenever an ambiquity is found your error listeners reportAmbiguity function is triggered. Override that to do your own handling in this situation.

Difference between grammar rules

Say there are two grammar rules
Rule 1 B -> aB | cB
and
Rule 2 B -> Ba | Bc
I'm a bit confused as the difference of these two. Would rule 1's expression be (a+c)* ? Then what would Rule 2's expression be?
Both of those grammars yield the empty language since there is no non-recursive rule, so no sentence consisting only of terminals can be derived.
If you add the production B→ε, both grammars would yield the same language, equivalent to the regular expression (a+c)*. However, the parse trees produced by the parse would be quite different.

Antlr rule for matching filename

I am looking for a good way to match a filename in Antlr.
The filename could be DOS or Unix style.
If you have a good solution that to that, feel free to ignore the rest of this question because it is just my newbie attempt at solving the problem and I am probably way off. I have included it because some people like to see sample code.
For purposes of discussion, here is a here is what I am thinking. This is not my actual grammar as all I am interested in for this discussion is filename parsing so I reduced the sample that somewhat meaningful in that context.
Lexer.g4:
lexer grammar Lexer;
K_COPY : C O P Y ;
FILEPATH: [-.a-zA-Z0-9:/\]+;
Parser.g4
parser grammar Parser;
options { tokenVocab=Lexer; }
commandfile: (statement NEWLINE)* EOF;
statement : copy_stmt
;
copy_stmt: K_COPY left=filepath right=filepath
;
// Add characters as we make rules as to what characters are valid:
filepath: FILEPATH;
That is what I am thinking but I am new to Antlr so I wanted to get some feedback before I proceed.
I am using Antlr for this project is already decided and a good part of this project is already working in Antlr, so I am only looking for Antlr based solutions.

type3-only lexers in ANTLR4?

I'm thinking about using ANTLR in my lecture on formal languages since it's input language is pretty clean and easy to learn.
Since I am not an expert using ANTLR I tried some standard examples to get familiar with it's syntax, error messages etc.
Doing so I found out, that:
lexer grammar KFG;
R : 'a'R'b' | 'ab';
is a valid lexer that can be executed e.g. by:
echo "aaabbb" | grun KFG tokens -tokens
Since the grammar is context free it should only be parsable by a parser an not a lexer.
Is there any way to force ANTLR to accept only type 3 grammars for lexers?
Cheers,
Alex
Is there any way to force ANTLR to accept only type 3 grammars for lexers?
AFAIK, no, that is not possible.

Yacc "rule useless due to conflicts"

i need some help with yacc.
i'm working on a infix/postfix translator, the infix to postfix part was really easy but i'm having some issue with the postfix to infix translation.
here's an example on what i was going to do (just to translate an easy ab+c- or an abc+-)
exp: num {printf("+ ");} exp '+'
| num {printf("- ");} exp '-'
| exp {printf("+ ");} num '+'
| exp {printf("- ");} num '-'
|/* empty*/
;
num: number {printf("%d ", $1);}
;
obiously it doesn't work because i'm asking an action (with the printfs) before the actual body so, while compiling, I get many
warning: rule useless in parser due to conflict
the problem is that the printfs are exactly where I need them (or my output wont be an infix expression). is there a way to keep the print actions right there and let yacc identify which one it needs to use?
Basically, no there isn't. The problem is that to resolve what you've got, yacc would have to have an unbounded amount of lookahead. This is… problematic given that yacc is a fairly simple-minded tool, so instead it takes a (bad) guess and throws out some of your rules with a warning. You need to change your grammar so yacc can decide what to do with a token with only a very small amount of lookahead (a single token IIRC). The usual way to do this is to attach the interpretations of the values to the tokens and either use a post-action or, more practically, build a tree which you traverse as a separate step (doing print out of an infix expression from its syntax tree is trivial).
Note that when you've got warnings coming out of yacc, that typically means that your grammar is wrong and that the resulting parser will do very unexpected things. Refine it until you get no warnings from that stage at all. That is, treat grammar warnings as errors; anything else and you'll be sorry.