Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need help with my context-free grammar for this language:
{b^4 n^m bd^n n^3n+m | m,n >= 0}
So far I got this:
S-> bbbbXbY
X-> n | E
Y-> dYnnnX
Assuming S is start symbol, E is empty word (𝜀), your language is {b^4 n^r b d^s n^(3s+r) | r,s ≥ 0}.
The correct grammar
S → bbbbN
N → bD | nNn
D → dDnnn | 𝜀
Explanation
S generates b^4 on the left and switches to N. It never occurs in the derivation again.
N generates n^r on both sides, then b in the middle and switches to D. After that it never occurs in the derivation again.
Finally D generates d^s n^(3s) and finishes the derivation.
S (start symbol)
→ b^4 N (applied S → bbbbN)
→ b^4 n^r N n^r (applied N → nNn r-times)
→ b^4 n^r bD n^r (applied N → bD)
→ b^4 n^r b d^s D (nnn)^s n^r (applied D → dDnnn s-times)
→ b^4 n^r b d^s (nnn)^s n^r (applied D → 𝜀)
Mistakes in the original grammar
Your grammar generates empty language, because Y will always be present in the product of each derivation step (infinite recursion). There is also a fundamental problem with context: the first sequence of ns is generated by X from S → bbbbXbY independently of the second one generated by Xs from Y → dYnnnX. If you add Y → 𝜀, the language will be {b^4 n^r b d^s n^(3s+t)}. And the grammar will be ambiguous, as bbbbXbddYnnnXnnnX can be generated (using Y → dYnnnX twice) and the final sequence of ns can be usually generated in many ways.
Steps to fix the original grammar
Add Y → 𝜀 to stop infinite recursion.
Move X from the end of Y → dYnnnX to the end of S → bbbbXbY to get rid of ambiguity.
Chain the Xs in newly created S → bbbbXbYX together to force the context. The same amount of ns must be generated by both of them.
Now you have the correct grammar at the top of this answer.
Related
Assume that I have 2 productions X and Y.
X-> AL|BL|X,
Y-> CK|DK|X.
I guess is not possible to reduce unit productions in X, but is this the case for Y?
You can remove the unit production X → X as it has no effect - expanding X → X in any derivation can be removed and replaced by using one of the other X productions.
Having done that, you can then remove all unit productions for Y by replacing Y → X with Y → ω for each remaining production X → ω.
I'm trying to understand CFG by using an example with some obstacles in it.
For example, I want to match the declaration of a double variable:
double d; In this case, "d" could be any other valid identifier.
There are some cases that should not be matched, e.g. "double double;", but I don't understand how to avoid a match of the second "double"
My approach:
G = (Σ, V, S, P)
Σ = {a-z}
V = {S,T,U,W}
P = { S -> doubleTUW
T -> _(space)
U -> (a-z)U | (a-z)
W -> ;
}
Now there must be a way to limit the possible outcomes of this grammar by using the function L(G). Unfortunately, I couldn't find a syntax that meet my requirement to deny a second "double".
Here's a somewhat tedious regular expression to match any identifier other than double. Converting it to a CFG can be done mechanically but it is even more tedious.
([a-ce-z]|d[a-np-z]|do[a-tv-z]|dou[ac-z]|doub[a-km-z]|doubl[a-df-z]|double[a-z])[a-z]*
Converting it to a CFG can be done mechanically but it is even more tedious:
ALPHA → a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_B → a|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_D → a|b|c|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_E → a|b|c|d|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_L → a|b|c|d|e|f|g|h|i|j|k|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_O → a|b|c|d|e|f|g|h|i|j|k|l|m|n|p|q|r|s|t|u|v|w|x|y|z
NOT_U → a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|v|w|x|y|z
WORD → NOT_D
| d NOT_O
| do NOT_U
| dou NOT_B
| doub NOT_L
| doubl NOT_E
| double ALPHA
| WORD ALPHA
This is why many of us usually use scanner generators like (f)lex which handle such exclusions automatically.
I find difficulties in constructing a Grammar for the language especially with linear grammar.
Can anyone please give me some basic tips/methodology where i can construct the grammar for any language ? thanks in advance
I have a doubt whether the answer for this question "Construct a linear grammar for the language: is right
L ={a^n b c^n | n belongs to Natural numbers}
Solution:
Right-Linear Grammar :
S--> aS | bA
A--> cA | ^
Left-Linear Grammar:
S--> Sc | Ab
A--> Aa | ^
As pointed out in the comments, these grammars are wrong since they generate strings not in the language. Here's a derivation of abcc in both grammars:
S -> aS -> abA -> abcA -> abccA -> abcc
S -> Sc -> Scc -> Abcc -> Aabcc -> abcc
Also as pointed out in the comments, there is a simple linear grammar for this language, where a linear grammar is defined as having at most one nonterminal symbol in the RHS of any production:
S -> aSc | b
There are some general rules for constructing grammars for languages. These are either obvious simple rules or rules derived from closure properties and the way grammars work. For instance:
if L = {a} for an alphabet symbol a, then S -> a is a gammar for L.
if L = {e} for the empty string e, then S -> e is a grammar for L.
if L = R U T for languages R and T, then S -> S' | S'' along with the grammars for R and T are a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = RT for languages R and T, then S = S'S'' is a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = R* for language R, then S = S'S | e is a grammar for L if S' is the start symbol of the grammar for R.
Rules 4 and 5, as written, do not preserve linearity. Linearity can be preserved for left-linear and right-linear grammars (since those grammars describe regular languages, and regular languages are closed under these kinds of operations); but linearity cannot be preserved in general. To prove this, an example suffices:
R -> aRb | ab
T -> cTd | cd
L = RT = a^n b^n c^m d^m, 0 < a,b,c,d
L' = R* = (a^n b^n)*, 0 < a,b
Suppose there were a linear grammar for L. We must have a production for the start symbol S that produces something. To produce something, we require a string of terminal and nonterminal symbols. To be linear, we must have at most one nonterminal symbol. That is, our production must be of the form
S := xYz
where x is a string of terminals, Y is a single nonterminal, and z is a string of terminals. If x is non-empty, reflection shows the only useful choice is a; anything else fails to derive known strings in the language. Similarly, if z is non-empty, the only useful choice is d. This gives four cases:
x empty, z empty. This is useless, since we now have the same problem to solve for nonterminal Y as we had for S.
x = a, z empty. Y must now generate exactly a^n' b^n' b c^m d^m where n' = n - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x empty, z = d. Y must now generate exactly a^n b^n c c^m' d^m' where m' = m - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x = a, z = d. Y must now generate exactly a^n' b^n' bc c^m' d^m' where n' and m' are as in 2 and 3. But then the exact same argument applies to the grammar whose start symbol is Y.
None of the possible choices for a useful production for S is actually useful in getting us closer to a string in the language. Therefore, no strings are derived, a contradiction, meaning that the grammar for L cannot be linear.
Suppose there were a grammar for L'. Then that grammar has to generate all the strings in (a^n b^n)R(a^m b^m), plus those in e + R. But it can't generate the ones in the former by the argument used above: any production useful for that purpose would get us no closer to a string in the language.
how can it be that the rule "Aa -> aA" is context-sensitive? According to the definition, context-sensitive rules have to be like this form:
αAβ → αγβ
where
A ∈ N, α,β ∈ (N∪Σ)* and γ ∈ (N∪Σ)+
Thanks.
It depends on what you mean. If you scroll down the Wikipedia entry, you can see that, formally,
cB → Bc
does not fit the scheme, but it can be simulated by 4 rules that do fit it:
c B → W B
W B → W X
W X → B X
B X → B c
So Aa → aA is not a CSG rule in itself, but the langue it generates is. Perhaps whoever told you it is, was using it as a shorthand (you could expand the definition of CSG rules to include these types of things as "macros").
Consider following grammar:
A → BC
B → Ba | epsilon
C → bD | epsilon
D → …
…
The problem here is that rule B can derive epsilon and left-recursive as well.
In order to find FIRST(A) I am searching FIRST(B).
But I stuck on FIRST(B), because it is left-recursive.
So what is FIRST(B)? And FIRST(A)?
My version is:
FIRST(B) → {a, epsilon}
FIRST(A) → {a, b, epsilon}
Is that correct?
Yes, you have it right. A left-recursion does not contribute to FIRST, because when Ba is matched for B, the B in Ba must start with something that B can start with - because it's a B, after all. :)
You could also instead factor out the left-recursion first.