How to remove ambiguity in following grammar? - grammar

S -> Sa | SbSa | ε
I found a similar question but i don't understand : http://automatasteps.blogspot.co.id/2007/08/unambiguous-grammar.html
How do i change this to the unambiguous one?
My string is bbaaa

Related

Optional rule (myRule?) vs rule and empty alternative ((myRule | ))

In the ANTLRv4 grammar that one can find in the grammars-v4 repository (https://github.com/antlr/grammars-v4/blob/master/antlr4/ANTLRv4Parser.g4) the optional rule ebnfSuffix is:
sometimes matched using ebnfSuffix?, see lexerElement
sometimes matched using (ebnfSuffix | ), see element.
I was indeed asking to myself, and here as well, if the two have slightly different meaning.
The grammars-v4 repository has another example in https://github.com/antlr/grammars-v4/blob/master/cql3/CqlParser.g4 of the same two patterns with respect to beginBatch rule used has optional element or together with an empty alternative.
EDIT: I've added here the part of the grammar I'm referring to as suggested:
lexerElement
: labeledLexerElement ebnfSuffix? <-- case 1: optional rule
| lexerAtom ebnfSuffix?
| lexerBlock ebnfSuffix?
| actionBlock QUESTION?
;
element
: labeledElement (ebnfSuffix |) <-- case 2: block with empty alternative
| atom (ebnfSuffix |)
| ebnf
| actionBlock QUESTION?
;
Both ebnfSuffix? and (ebnfSuffix | ) result in exactly the same behaviour: they (greedily) optionally match ebnfSuffix.
The fact that they're both being used in a grammar could be because it was translated from some spec (or other grammar) that used that notation and that notation didn't have the ? operator, but that's just guessing.
Personally I'd just use ebnfSuffix?.

How do I properly parse Regex in ANTLR

I want to parse this
VALID_EMAIL_REGEX = /\A[\w+\-.]+#[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
and other variations of course of regular expressions.
Does someone know how to do this properly?
Thanks in advance.
Edit: I tried throwing in all regex signs and chars in one lexer rule like this
REGEX: ( DIV | ('i') | ('#') | ('[') | (']') | ('+') | ('.') | ('*') | ('-') | ('\\') | ('(') | (')') |('A') |('w') |('a') |('z') |('Z')
//|('w')|('a'));
and then make a parser rule like this:
regex_assignment: (REGEX)+
but there are recognition errors(extraneous input). This is definetly because these signs are ofc used in other rules before.
The thing is I actually don't need to process these regex assignments, I just want it to be recognized correctly without errors. Does anyone have an approach for this in ANTLR? For me a solution would suffice, that just recognzies this as regex and skips it for example.
Unfortunately, there is no regex grammar yet in the ANTLR grammar repository, but similar questions have come up before, e.g. Regex Grammar. Once you have the (E)BNF you can convert that to ANTLR. Or alternatively, you can use the BNF grammar to check your own grammar rules to see if they are correctly defined. Simply throwing together all possible input chars in a single rule won't work.

Context Free Grammar tips

So I've come across this problem in my study of context free grammars and I have no idea how to get the production rules of this English sentence.
Language L is defined as:
"All odd-length strings over {a, b}∗ with
middle symbol a."
You can build the string from the middle and always add one letter to both sides
A -> aAa | aAb | bAa | bAb | a

regular/context free grammar

Im hoping someone can help me understand a question I have, its not homework, its just an example question I am trying to work out. The problem is to define a grammar that generates all the sums of any number of operands. For example, 54 + 3 + 78 + 2 + 5... etc. The way that I found most easy to define the problem is:
non-terminal {S,B}
terminal {0..9,+,epsilon}
Rules:
S -> [0..9]S
S -> + B
B -> [0..9]B
B -> + S
S -> epsilon
B -> epsilon
epsilon is an empty string.
This seems like it should be correct to me as you could define the first number recursively with the first rule, then to add the next integer, you could use the second rule and then define the second integer using the third rule. You could then use the fourth rule to go back to S and define as many integers as you need.
This solution seems to me to be a regular grammar as it obeys the rule A -> aB or A -> a but in the notes it says for this question that it is no possible to define this problem using a regular grammar. Can anyone please explain to me why my attempt is wrong and why this needs to be context free?
Thanks.
Although it's not the correct definition, it's easier to think that for a language to be non-regular it would need to balance something (like parenthesis).
Your problem can be solved using direct recursion only on the sides of the rules, not in the middle, so it can be solved using a regular language. (Again, this is not the correct definition, but it's easier to remember!)
For example, for a regular expression engine (like in Perl or JavaScript) one could easily write /(\d+)(\+(\d+))*/.
You could write it this way:
non-terminal {S,R,N,N'}
terminal {0..9,+,epsilon}
Rules:
S -> N R
S -> epsilon
N -> [0..9] N'
N' -> N
N' -> epsilon
R -> + N R
R -> epsilon
Which should work correctly.
The language is regular. A regular expression would be:
((0|1|2|...|9)*(0|1|2|...|9)+)*(0|1|2|...|9)*(0|1|2|...|9)
Terminals are: {0,1,2,...,9,+}
"|" means union and * stands for Star closure
If you need to have "(" and ")" in your language, then it will not be regular as it needs to match parentheses.
A sample context free grammar would be:
E->E+E
E->(E)
E->F
F-> 0F | 1F | 2F | ... | 9F | 0 | 1 | ... | 9

How do I add parenthesis to this rule?

I have a left-recursive rule like the following:
EXPRESSION : EXPRESSION BINARYOP EXPRESSION | UNARYOP EXPRESSION | NUMBER;
I need to add parenthesis to it but I'm not sure how to make a left parenthesis depend on a matching right parenthesis yet still optional. Can someone show me how? (Or am I trying to do entirely too much in lexing, and should I leave some or all of this to the parsing?)
You could add a recursive rule:
EXPRESSION : EXPRESSION BINARYOP EXPRESSION
| UNARYOP EXPRESSION
| NUMBER
| OPENPARENS EXPRESSION CLOSEPARENS
;
Yes, you're trying to do too much in the lexer. Here's how to get around the left-recursive rules:
http://www.antlr.org/wiki/display/ANTLR3/Expression+evaluator (see how the parser rule expr trickles down to the rule atom and then get called recursively from atom again)
HTH