How to write this CFG? - grammar

the question is to construct a CFG which generates language
my solution is: S -> aSb | aS | bS | a | b, however, this grammar can also generate strings like aabb, so how to do it?
Thanks for help.

So you want a string of a's then a string of b's, with an unequal number of a's and b's. First, let's ignore the equality condition. Then:
S -> aSb | 0
will generate all strings that start with a's and then b's. This rule guarantees an equal number of a's and b's, or the empty string. Now what we want is either more a's, or more b's, but not both. Because if we wanted one more a AND one more b, we'd just apply S again. So we add two new rules:
A -> aA
B -> bB
and update S to be:
S -> aSb | A | B
So now we can add an equal number of a and b, or add more a's, or add more b's, but not both. This guarantees inequality, so we're almost done. If you don't need the empty string, you can just stop here. For the null string, we can't do:
S -> aSb | A | B | 0,
because that can lead to S -> aSb -> a0b -> ab, which violates the condition. We also can't do:
A -> aA | 0,
because that can produce S -> aSb -> aAb -> a0b -> ab. So what do we do? The trick is to force S's later expansions to have at least one a or b, like this:
S -> aSb | aA | bB
A -> aA | 0
B -> bB | 0
and that's your solution.

Related

Context Free Grammar BNF

need help with a non-extended BNF grammar:
Σ = {a,b,c}
L = {ω ɛ Σ^* | such that all a's (if any) comes before all c's(if any)}
For example, the strings aba, cbc, and abacbc are in the language, but string abcabc is not.
This is what i have so far (is it correct ? please correct me if i am wrong):
s->asbsc|bsasc|ascsb|ɛ
Your comment says you want equal numbers of a and c, so start with the simple grammar that does that:
S -> aSc | ε
and add in any number of b's before/after/between those:
S -> BaScB | B
B -> Bb | ε
note that the above is not ambiguous (it's even LR(1)).
If you want to allow a different number of a's and c's, you can use the same approach to avoid ambiguity. Start with just the a's and c's:
S -> AC
A -> Aa | ε
C -> Cc | ε
and add in b's at the beginning and after each other character:
S -> BAC
A -> AaB | ε
C -> CcB | ε
B -> Bb | ε
Do the number of a's and c's need to be the same? If, not then you are missing those cases where they differ, such as: aac. I think something like this should work:
S -> AC
A -> aA | bA | ε
C -> bC | cC | ε
The A production is used for deriving a sequence of characters that are not a c and the C production is used for deriving a sequence of characters that are not an a.

Grammar for given language

I am given the language {w ∈ {a,b}∗| |w|a = |w|b + 1}. and am asked to find a grammar.
I have come up with the following:
S->aSb | bSa | aAa | bBb | a
A->bS
B->?
and was wondering if this was correct, or if not why?
It's not correct, because it cannot generate the valid sentence:
baaab
which has one more a than b. It should be obvious that this sentence cannot be generated because every sentence generated by your language has different start and end characters.
Edit The edited question is also not correct because the productions:
S -> ... | aAa | a | ...
A -> bS
is equivalent to (by substituting the RHS of A for its use in S):
S -> ... | abSa | a | ...
which will match as follows:
S -> abSa -> abaa

Tips for creating "Context Free Grammar"

I am new to CFG's,
Can someone give me tips in creating CFG that generates some language
For example
L = {am bn | m >= n}
What I got is:
So -> a | aSo | aS1 | e
S1 -> b | bS1 | e
but I think this area is wrong, because there is a chance that the number of b's can be greater than a's.
How to write CFG with example ambn
L = {am bn | m >= n}.
Language description: am bn consist of a followed by b where number of a are equal or more then number of b.
some example strings: {^, a, aa, aab, aabb, aaaab, ab......}
So there is always one a for one b but extra a are possible. infect string can be consist of a only. Also notice ^ null is a member of language because in ^ NumberOf(a) = NumberOf(b) = 0
How to write a grammar that accepts the language formed by strings am bn?
In the grammar, there should be rules such that if you add a b symbol you also add a a symbol.
and this can be done with something like:
S --> aSb
But this is incomplete because we need a rule to generate extra as:
A --> aA | a
Combine two production rules into a single grammar CFG.
S --> aSb | A
A --> aA | a
So you can generate any string that consist of a also a and b in (am bn) pattern.
But in above grammar there is no way to generate ^ string.
So, change this grammar like this:
S --> B | ^
B --> aBb | A
A --> aA | a
this grammar can generate {am bn | m >= n} language.
Note: to generate ^ null string, I added an extra first step in grammar by adding S--> B | ^, So you can either add ^ or your string of symbol a and b. (now B plays role of S from previous grammar to generate equal numbers of a and b)
Edit: Thanks to #Andy Hayden
You can also write equivalent grammar for same language {am bn | m >= n}:
S --> aSb | A
A --> aA | ^
notice: here A --> aA | ^ can generate zero or any number of a. And that should be preferable to my grammar because it generates a smaller parse tree for the same string.
(smaller in height preferable because of efficient parsing)
The following tips may be helpful to write Grammar for a formal language:
You are to be clear about language that what it describes (meaning/pattern).
You can remember solutions for some basic problems(the idea being that you can write new grammars).
You can write rules for fundamental languages like I have written for RE in this example to write Right-Linear-Grammmar. The rules will help you to write Grammar for New Languages.
One different approach is to first draw automata, then convert automata to Grammar. We have predefined techniques to write grammar from automata from any class of formal language.
Like a Good Programmer who learns by reading the code of others, similarly one can learn to write grammars for formal languages.
Also the grammar you have written is wrong.
you want to create a grammar for following language
L= {an bm | m>=n }
that means number of 'b' should be greater or equal then number of 'a'
or you can say that for each 'b' there could at most one 'a'. not other way around.
here is grammar for this language
S-> aSb | Sb | b | ab
in this grammar for each 'a' there is one 'b'. but b can be generated without generating any 'a'.
you can also try these languages:
L1= {an bm | m > n }
L2= {an bm | m >= 2n }
L3= {an bm | 2m >= n }
L4= {an bm | m != n }
i am giving grammar for each language.
for L1
S-> aSb | Sb | b
for L2
S-> aSbb | Sb | abb
for L3
S-> AASb | Sb | aab | ab | b
for L4
S-> S1 | S2
S1-> aS1b | S1b | b
S2-> aS2b | aS2 | a
Least variables: S -> a S b | a S | e
with less variables :
S -> a S b | a S | a b | e

First & Follow, Arithmetic Expressions

FIRST(A) = { b, epsilon }
FIRST(S) = { b, epsilon }
FOLLOW(S) = { a, $ }
FOLLOW(A) = { a, b, $ }
What is the Arithmetic Expressions for this First & Follow set?
FIRST(X) = the terminals which can appear first when trying parse the non-terminal X. If it can match an empty string, epsilon is also included.
FOLLOW(X) = the terminals which can appear immediately after the non-terminal X. This is a union of the FIRST-sets of all symbols appearing after X in any parsing rule.
Read more: LL parser
The clues given are:
FIRST(A), FIRST(S) ⇒ All of the derivations of A and S respectively, must either begin with the terminal b, or be zero-length.
S → b ... | ε
A → b ... | ε
FOLLOW(S) ⇒ There must be some construction where S is followed by the terminal a, or a non-terminal which can begin with a. (Neither A nor S qualify).
S → b S a | ε
A → b ... | ε
FOLLOW(A) ⇒ There must be some construction where A is followed by each of the terminals a and b, or some non-terminal which can begin with those.
S → b S a | ε
A → b A b | b A a | ε
FOLLOW(A) ⇒ Assuming S is the start-symbol, A must appear at the end of some branch of S, possibly followed by other nullable non-terminals.
S → b S a | A | ε
A → b A b | b A a | ε
(NB. Adding A to S did not break the constraint on FIRST(S))
We can make the grammar a little smaller:
S → b S a | A | ε
A → b A b | ε
We can no longer generate strings like "bbbabb", but it does not violate the constraints.

Converting grammar to Chomsky Normal Form?

Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.
S -> AB | aB
A -> aab|lambda
B -> bbA
Ok so the first thing I did was add a new start variable S0
so now I have
S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA
then I removed all of the lambda rules:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Then I checked for S->S and A->B type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?
Wikipedia says:
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:
A -> BC, or
A -> α, or
S -> ε
where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.
Continuing your work:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Instead of using | to denote different choices, split a rule into multiple rules.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Create new rules Y -> a and Z -> b because we will need them soon.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
S -> aB is not of the form S -> BC because a is a terminal. So change a into Y:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
Do the same for the B -> bb rule:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> ZZ
Y -> a
Z -> b
For A -> aab, create C -> YY; for B -> bbA, create D -> ZZ:
S0 -> S
S -> AB
S -> YB
S -> B
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
For S -> B, duplicate the one rule where S occurs on the right hand side and inline the rule:
S0 -> B
S0 -> S
S -> AB
S -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
Deal with the rules S0 -> B and S0 -> S by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):
S0 -> DA
S0 -> ZZ
S0 -> AB
S0 -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
And we're done. Phew!
Without getting into too much theory and proofs(you could look at this in Wikipedia), there are a few things you must do when converting a Context Free Grammar to Chomsky Normal Form, you generally have to perform four Normal-Form Transformations. First, you need to identify all the variables that can yield the empty string(lambda/epsilon), directly or indirectly - (Lambda-Free form). Second, you need to remove unit productions - (Unit-Free form). Third, you need to find all the variables that are live/useful (Usefulness). Four, you need to find all the reachable symbols (Reachable). At each step you might or might not have a new grammar. So for your problem this is what I came up with...
Context-Free Grammar
G(Variables = { A B S }
Start = S
Alphabet = { a b lamda}
Production Rules = {
S -> | AB | aB |
A -> | aab | lamda |
B -> | bbA | } )
Remove lambda/epsilon
ERRASABLE(G) = { A }
G(Variables = { A S B }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | B |
B -> | bbA | bb | } )
Remove unit produtions
UNIT(A) { A }
UNIT(B) { B }
UNIT(S) { B S }
G (Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Determine live symbols
LIVE(G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Remove unreachable
REACHABLE (G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Replace all mixed strings with solid nonterminals
G( Variables = { A S B R I }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IIA |
A -> | RRI |
B -> | IIA | II |
R -> | a |
I -> | b | })
Chomsky Normal Form
G( Variables = { V A B S R L I Z }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IV |
A -> | RL |
B -> | IZ | II |
R -> | a |
I -> | b |
L -> | RI |
Z -> | IA |
V -> | IA | })
Alternative answer: The grammar can only produce a finite number of strings, namely 6.
S -> aabbbaab | aabbb | bbaab | bb | abbaab | abb.
You can now condense this back to Chomsky Normal Form by hand.
By substitution, we can find the set of all strings produced. Your initial rules:
S -> AB | aB.
A -> aab | lambda.
B -> bbA.
First split up the S rule:
S -> AB.
S -> aB.
Now substitute what A and B expand into:
S -> AB
-> (aab | lambda) bbA
-> (aab | lambda) bb (aab | lambda).
S -> aB
-> abbA
-> abb (aab | lambda).
Expand these again to get:
S -> aabbbaab.
S -> aabbb.
S -> bbaab.
S -> bb.
S -> abbaab.
S -> abb.
To change this finite set to Chomsky Normal Form, it suffices to do it by brute force without any intelligent factoring. First we introduce two terminal rules:
X -> a.
Y -> b.
Now for each string, we consume the first letter with a terminal variable and the remaining letters with a new variables. For example, like this:
S -> aabbb. (initial rule, not in Chomsky Normal Form)
S -> XC, where X->a and C->abbb.
C -> XD, where X->a and D->bbb.
D -> YE, where Y->b and E->bb.
E -> YY, where Y->b and Y->b.
We just go through this process for all 6 strings, generating a lot of new intermediate variables.