What language does this mean? - grammar

this is the language:
L = { w belong {a,b,c}* | |w|= 3 * number(a) (w) }
Then, what does that mean?

It means that L is the language of strings w consisting of symbols 'a', 'b'' and 'c', where the length of the string w equals to 3 times the number of symbol 'a' present in the string w.
The productions for this grammars should be such that if it add one 'a' then it also adds two 'b', or two 'c', or one 'b'; one 'c'. Check below grammar:
S → ^ | SaSMSM | SMSaSM | SMSMSa
M → b | c
here ^ means epsilon.
To generate aabbcc use Right most derivation
S → SaSMSM
replace first S in rhs by ^ using S → ^
S → SaSMSM → aSMSM
replace S → SaSMSM
S → SaSMSM → aSaSMSMMSM
use S → ^
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM
use S → ^
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM
M → b
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM
use S → ^
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM → aabMMSM
M → b
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM → aabMMSM → aabbMSM
M → c
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM → aabMMSM → aabbMSM → aabbcSM
use S → ^
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM → aabMMSM → aabbMSM → aabbcSM → aabbcM
M → c
S → SaSMSM → aSaSMSMMSM → aaSMSMMSM → aaMSMMSM → aabSMMSM → aabMMSM → aabbMSM → aabbcSM → aabbcM → aabbcc

S -> aBCS | aBC
S -> BCaS | BCa
S -> BaCS | BaC
S -> aCBS | aCB
S -> CaBS | CaB
S -> CBaS | CBa
B -> b
C -> c
may be right ?

Related

BNF grammar associativity

I'm trying to understand how left and right associative grammars work and I need a little help. So I decided to come up an example and ask for some clarification.
Basically, I want to create a grammar for two logical operations: and + implication. I want to make it so and is left associative and implication is right associative. This is what I got so far. Is this correct? I feel like it might be ambiguous. (I also kept in mind that and has higher precedence than implication)
<exp> := <and>
<and> := <impl> | <and> ^ <impl>
<impl> := <term> | <term> -> <impl>
<term> := (<exp>) | <bool>
<bool> := true | false
From my limited knowledge, it seems to me that you got the precedences inverted.
At the grammar level, a left associative operator has the following format:
exp = exp op other | other
...and a right associative operator would have the following format:
exp = other op exp | other
As you can see, it depends on your use of recursion: left associativity would use a left recursive rule while right associativity would use a right recursive one.
As for precedence, the later a rule is in the grammar, the higher its precedence. In the grammar bellow, where opL represents a left-associative operator and opR represents a right associative one, exp0 has lower precedence than exp1, which has lower precendence than other:
exp0 = exp0 opL exp1 | exp1
exp1 = other opR exp1 | other
other = ...
As an example, if opL is "+" and opR is "**" and other is a letter, see how the parse tree for a few expressions would be built:
Left associativity:
a + b + c -> (a + b) + c
exp0 -+-> exp0 +-> exp0 --> exp1 --> other --> a
| |
| +-> opL --> "+"
| |
| \-> exp1 --> other --> b
|
+-> opL --> "+"
|
\-> exp1 --> c
Right Associativity:
a ** b ** c -> a ** (b ** c)
exp0 --> exp1 +-> other --> a
|
+-> opR --> "**"
|
\-> exp1 +-> other --> b
|
+-> opR --> "**"
|
\-> exp1 --> other --> c
Precedence:
a + b ** c -> a + (b ** c)
exp0 +-> exp0 +-> exp1 --> other --> a
|
+-> opL --> "+"
|
\-> exp1 +-> other --> b
|
+-> opR --> "**"
|
\-> exp1 --> other --> c

Context Free Grammar BNF

need help with a non-extended BNF grammar:
Σ = {a,b,c}
L = {ω ɛ Σ^* | such that all a's (if any) comes before all c's(if any)}
For example, the strings aba, cbc, and abacbc are in the language, but string abcabc is not.
This is what i have so far (is it correct ? please correct me if i am wrong):
s->asbsc|bsasc|ascsb|ɛ
Your comment says you want equal numbers of a and c, so start with the simple grammar that does that:
S -> aSc | ε
and add in any number of b's before/after/between those:
S -> BaScB | B
B -> Bb | ε
note that the above is not ambiguous (it's even LR(1)).
If you want to allow a different number of a's and c's, you can use the same approach to avoid ambiguity. Start with just the a's and c's:
S -> AC
A -> Aa | ε
C -> Cc | ε
and add in b's at the beginning and after each other character:
S -> BAC
A -> AaB | ε
C -> CcB | ε
B -> Bb | ε
Do the number of a's and c's need to be the same? If, not then you are missing those cases where they differ, such as: aac. I think something like this should work:
S -> AC
A -> aA | bA | ε
C -> bC | cC | ε
The A production is used for deriving a sequence of characters that are not a c and the C production is used for deriving a sequence of characters that are not an a.

Ambiguous grammar

I am looking at the following grammar and I believe its Ambiguous at line 3, but not sure.
<SL> → <S>
<SL> → <SL> <S>
<S> → i <B> <S> e <S>
<S> → i <B> <S>
<S> → x
<S> → y
<B> → 5
<B> → 13
I found this string xi13yi5xeyx I believe generates two different parse trees, but I'm not sure if im doing it wrong.
Can some one please verify my findings?
Yes your grammar is an ambiguous grammar!
You have not mention But I think <SL> is start viable
Using your grammar rules we can draw more then one parse tree(two) for wring i5i5yey as followes:
<SL> <SL>
| |
<S> <S>
/ /|\ \ / | \
/ / | \ \ / | \
/ / | \ \ / | \
/ / | \ \ i <B> <S>
/ | | | \ | / /|\ \
i <B> <S> e <S> 5 / / | \ \
/ / | \ | / / | \ \
/ / | \ y / / | \ \
5 i <B> <S> / | | | \
| | i <B> <S> e <S>
5 y | | |
5 y y
Structure of both parse tree are different two the grammar is an ambiguous grammar!
You can extend above diagram to generate tree string xi13yi5xeyx, (I am leaving this as an exercise for you)
Important is the language generate by this grammar is not ambiguous language.And its possible to write an equivalent unambiguous grammar for this grammar that always generates unique tree for each string in language of grammar.
HINT: To write unambiguous grammar.
The grammar is quite similar to grammar for if loop in C language (notice different language having different syntax for if loop). and it solved in almost all compiler design book.
Resolving the General Dangling Else/If-Else Ambiguity
Reference: Book Compilers Principles, Technique, and tools by Aho-Ullman Section 4.5 conflicts During Shift and-Reduce Parsing.

First & Follow, Arithmetic Expressions

FIRST(A) = { b, epsilon }
FIRST(S) = { b, epsilon }
FOLLOW(S) = { a, $ }
FOLLOW(A) = { a, b, $ }
What is the Arithmetic Expressions for this First & Follow set?
FIRST(X) = the terminals which can appear first when trying parse the non-terminal X. If it can match an empty string, epsilon is also included.
FOLLOW(X) = the terminals which can appear immediately after the non-terminal X. This is a union of the FIRST-sets of all symbols appearing after X in any parsing rule.
Read more: LL parser
The clues given are:
FIRST(A), FIRST(S) ⇒ All of the derivations of A and S respectively, must either begin with the terminal b, or be zero-length.
S → b ... | ε
A → b ... | ε
FOLLOW(S) ⇒ There must be some construction where S is followed by the terminal a, or a non-terminal which can begin with a. (Neither A nor S qualify).
S → b S a | ε
A → b ... | ε
FOLLOW(A) ⇒ There must be some construction where A is followed by each of the terminals a and b, or some non-terminal which can begin with those.
S → b S a | ε
A → b A b | b A a | ε
FOLLOW(A) ⇒ Assuming S is the start-symbol, A must appear at the end of some branch of S, possibly followed by other nullable non-terminals.
S → b S a | A | ε
A → b A b | b A a | ε
(NB. Adding A to S did not break the constraint on FIRST(S))
We can make the grammar a little smaller:
S → b S a | A | ε
A → b A b | ε
We can no longer generate strings like "bbbabb", but it does not violate the constraints.

Converting grammar to Chomsky Normal Form?

Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.
S -> AB | aB
A -> aab|lambda
B -> bbA
Ok so the first thing I did was add a new start variable S0
so now I have
S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA
then I removed all of the lambda rules:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Then I checked for S->S and A->B type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?
Wikipedia says:
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:
A -> BC, or
A -> α, or
S -> ε
where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.
Continuing your work:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Instead of using | to denote different choices, split a rule into multiple rules.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Create new rules Y -> a and Z -> b because we will need them soon.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
S -> aB is not of the form S -> BC because a is a terminal. So change a into Y:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
Do the same for the B -> bb rule:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> ZZ
Y -> a
Z -> b
For A -> aab, create C -> YY; for B -> bbA, create D -> ZZ:
S0 -> S
S -> AB
S -> YB
S -> B
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
For S -> B, duplicate the one rule where S occurs on the right hand side and inline the rule:
S0 -> B
S0 -> S
S -> AB
S -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
Deal with the rules S0 -> B and S0 -> S by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):
S0 -> DA
S0 -> ZZ
S0 -> AB
S0 -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
And we're done. Phew!
Without getting into too much theory and proofs(you could look at this in Wikipedia), there are a few things you must do when converting a Context Free Grammar to Chomsky Normal Form, you generally have to perform four Normal-Form Transformations. First, you need to identify all the variables that can yield the empty string(lambda/epsilon), directly or indirectly - (Lambda-Free form). Second, you need to remove unit productions - (Unit-Free form). Third, you need to find all the variables that are live/useful (Usefulness). Four, you need to find all the reachable symbols (Reachable). At each step you might or might not have a new grammar. So for your problem this is what I came up with...
Context-Free Grammar
G(Variables = { A B S }
Start = S
Alphabet = { a b lamda}
Production Rules = {
S -> | AB | aB |
A -> | aab | lamda |
B -> | bbA | } )
Remove lambda/epsilon
ERRASABLE(G) = { A }
G(Variables = { A S B }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | B |
B -> | bbA | bb | } )
Remove unit produtions
UNIT(A) { A }
UNIT(B) { B }
UNIT(S) { B S }
G (Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Determine live symbols
LIVE(G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Remove unreachable
REACHABLE (G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Replace all mixed strings with solid nonterminals
G( Variables = { A S B R I }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IIA |
A -> | RRI |
B -> | IIA | II |
R -> | a |
I -> | b | })
Chomsky Normal Form
G( Variables = { V A B S R L I Z }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IV |
A -> | RL |
B -> | IZ | II |
R -> | a |
I -> | b |
L -> | RI |
Z -> | IA |
V -> | IA | })
Alternative answer: The grammar can only produce a finite number of strings, namely 6.
S -> aabbbaab | aabbb | bbaab | bb | abbaab | abb.
You can now condense this back to Chomsky Normal Form by hand.
By substitution, we can find the set of all strings produced. Your initial rules:
S -> AB | aB.
A -> aab | lambda.
B -> bbA.
First split up the S rule:
S -> AB.
S -> aB.
Now substitute what A and B expand into:
S -> AB
-> (aab | lambda) bbA
-> (aab | lambda) bb (aab | lambda).
S -> aB
-> abbA
-> abb (aab | lambda).
Expand these again to get:
S -> aabbbaab.
S -> aabbb.
S -> bbaab.
S -> bb.
S -> abbaab.
S -> abb.
To change this finite set to Chomsky Normal Form, it suffices to do it by brute force without any intelligent factoring. First we introduce two terminal rules:
X -> a.
Y -> b.
Now for each string, we consume the first letter with a terminal variable and the remaining letters with a new variables. For example, like this:
S -> aabbb. (initial rule, not in Chomsky Normal Form)
S -> XC, where X->a and C->abbb.
C -> XD, where X->a and D->bbb.
D -> YE, where Y->b and E->bb.
E -> YY, where Y->b and Y->b.
We just go through this process for all 6 strings, generating a lot of new intermediate variables.