How to construct a grammar that generates language L? - grammar

I'm in a Formal languages class and have a grammar quiz coming up. I'm assuming something like this will appear.
Consider the alphabet ∑ = {a, b, c}. Construct a grammar that generates the language L = {bab^nabc^na^p : n ≥ 0, p ≥ 1}. Assume that the start variable is S.

It was a very long time since I worked with formal languages for the last time, so, please, forgive my rustyness, but this would be the language: We divide S to a prefix variable (A) and a suffix variable (B). Then, we handle the prefix and the suffix separately, both of them have a possible rule of further recursion, and an end sign of empty, where no occurrence is required and the constant where at least an occurrence is required.
{bab^nabc^na^p : n ≥ 0, p ≥ 1}
S -> ASB
A -> babAabc
A -> {empty}
B -> Ba
B -> a

Related

Constructing a linear grammar for the language

I find difficulties in constructing a Grammar for the language especially with linear grammar.
Can anyone please give me some basic tips/methodology where i can construct the grammar for any language ? thanks in advance
I have a doubt whether the answer for this question "Construct a linear grammar for the language: is right
L ={a^n b c^n | n belongs to Natural numbers}
Solution:
Right-Linear Grammar :
S--> aS | bA
A--> cA | ^
Left-Linear Grammar:
S--> Sc | Ab
A--> Aa | ^
As pointed out in the comments, these grammars are wrong since they generate strings not in the language. Here's a derivation of abcc in both grammars:
S -> aS -> abA -> abcA -> abccA -> abcc
S -> Sc -> Scc -> Abcc -> Aabcc -> abcc
Also as pointed out in the comments, there is a simple linear grammar for this language, where a linear grammar is defined as having at most one nonterminal symbol in the RHS of any production:
S -> aSc | b
There are some general rules for constructing grammars for languages. These are either obvious simple rules or rules derived from closure properties and the way grammars work. For instance:
if L = {a} for an alphabet symbol a, then S -> a is a gammar for L.
if L = {e} for the empty string e, then S -> e is a grammar for L.
if L = R U T for languages R and T, then S -> S' | S'' along with the grammars for R and T are a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = RT for languages R and T, then S = S'S'' is a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = R* for language R, then S = S'S | e is a grammar for L if S' is the start symbol of the grammar for R.
Rules 4 and 5, as written, do not preserve linearity. Linearity can be preserved for left-linear and right-linear grammars (since those grammars describe regular languages, and regular languages are closed under these kinds of operations); but linearity cannot be preserved in general. To prove this, an example suffices:
R -> aRb | ab
T -> cTd | cd
L = RT = a^n b^n c^m d^m, 0 < a,b,c,d
L' = R* = (a^n b^n)*, 0 < a,b
Suppose there were a linear grammar for L. We must have a production for the start symbol S that produces something. To produce something, we require a string of terminal and nonterminal symbols. To be linear, we must have at most one nonterminal symbol. That is, our production must be of the form
S := xYz
where x is a string of terminals, Y is a single nonterminal, and z is a string of terminals. If x is non-empty, reflection shows the only useful choice is a; anything else fails to derive known strings in the language. Similarly, if z is non-empty, the only useful choice is d. This gives four cases:
x empty, z empty. This is useless, since we now have the same problem to solve for nonterminal Y as we had for S.
x = a, z empty. Y must now generate exactly a^n' b^n' b c^m d^m where n' = n - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x empty, z = d. Y must now generate exactly a^n b^n c c^m' d^m' where m' = m - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x = a, z = d. Y must now generate exactly a^n' b^n' bc c^m' d^m' where n' and m' are as in 2 and 3. But then the exact same argument applies to the grammar whose start symbol is Y.
None of the possible choices for a useful production for S is actually useful in getting us closer to a string in the language. Therefore, no strings are derived, a contradiction, meaning that the grammar for L cannot be linear.
Suppose there were a grammar for L'. Then that grammar has to generate all the strings in (a^n b^n)R(a^m b^m), plus those in e + R. But it can't generate the ones in the former by the argument used above: any production useful for that purpose would get us no closer to a string in the language.

How can I construct a grammar that generates this language? grammar context-free-grammar

I'm studying for a finite automata & grammars test and I'm stuck with this question:
Construct a grammar that generates L:
L = {a^n b^m c^2n | n>=0, m>=0}
How can I construct a grammar that generates this language?
grammar context-free-grammar automata
I think this should do the trick. I verified this on http://mdaines.github.io/grammophone/ .
S -> a B c c
| a S c c
| .
B -> b B
| .
I find it always helps with these kinds of questions to come up with some rules for how to build big strings out of little strings. First, identify the littlest strings in your language. In our case, we can start with the observation that if n = 0, b^m is in our language; that is, w in b* is in our language. We then note that if x is a string in our language we get another string by adding one a on the left and two cs on the right; that is, axcc is a string in our language also. So our rules are:
b* in L
if x in L then axcc in L
Writing this in terms of a CFG is now straightforward:
S -> B
S -> aScc
Here, S generates our language L and B generates the language b*. We complete the grammar by providing a grammar for b* with start symbol B:
(1) S -> B
(2) S -> aScc
(3) B -> e
(4) B -> B
Any string a^n b^m c^2n can be generated using n applications of rule 2, 1 application of rule 1, m applications of rule 4 and 1 application of rule 3. That this grammar generates no strings not in the language is left as an exercise.

How to determine if a context-free grammar describes a regular language?

Given an arbitrary context-free grammar, how can I check whether it describes a regular language?
I'm not looking for exam "tricks". I'm looking for a foolproof mechanical test that I can code.
If it helps, here's an example of a CFG that I might receive as an input.
Specifically, notice that the answer must be much more complicated than just looking for left- or right-recursion, since the presence of another type of recursion does not automatically imply the grammar is irregular.
S: A B C D X
A: A a
A:
B: b B
B:
C: c C c
C: c
D: D d D
D: d
X: x Y
X:
Y: y X
Y:
There is no such mechanical procedure because the problem of determining whether a CFG defines a regular language is undecidable.
This result is a simple application of Greibach's Thereom.

With respect to grammas, when are eplison production rules allowed?

I'm trying to understand a concept with respect to grammar and Production Rules.
According to most material on this subject:
1) Epsilon production rules are only allowable if they do not appear on the RHS of any other production rule.
However, taking a grammar:
G = { T,N,P,S }
Where:
T = {a,b}
N = {S,S1}
S = {S}
P {
S -> aSb
S -> ab
S1 -> SS1
S1 -> E //Please note, using E to represent Epsilon.
}
Where, the language of the grammar is:
L(G) = { a^n, b^n | n >= 1 }
In this case, a production rule containing Epsilon exists (derived from S1) but S1 also forms part of a RHS of another production rule (S1 -> SS1).
Doesn't this violate point 1?
Your statement:
Epsilon production rules are only allowable if they do not appear on the RHS of any other production rule.
would be better stated as
A non-terminal may have an epsilon production rules if that non-terminal does not appear on the right-hand side of any production rule.
In Chomsky's original hierarchy, epsilon productions were banned for all but Type 0 (unrestricted) grammars. If all epsilon productions are banned, then it is impossible for the grammar to produce the empty string. I believe this was not a concern for Chomsky; consequently, most modern formulations allow the start symbol to have an empty right-hand side as long as the start symbol itself does not appear on the right-hand side of any production.
As it happens, the restriction on epsilon-productions is somewhat stronger than is necessary. In the case of both context-free grammars and regular grammars (Chomsky type 2 and type 3 grammars), it is always possible to create a weakly-equivalent grammar without epsilon productions (except possibly the single production S → ε if the grammar can produce the empty string.) It is also possible to remove a number of other anomalies which complicate grammar analysis: unreachable symbols, unproductive symbols, and cyclic productions. The result of the combination of all these eliminations is a "proper context-free grammar".
Consequently, most modern formulations of context-free grammars do not require the right-hand sides to be non-empty.
Your grammar G = {T, N, S, P} with
T = {a, b}
N = {S, S1}
S = {S}
P {
S → a S b
S → a b
S1 → S S1
S1 → ε
}
contains an unreachable symbol, S1. We can easily eliminate it, producing the equivalent grammar G' = { T, N', S, P' }:
N' = {S}
P' {
S → a S b
S → a b
}
G' does not contain any epsilon productions (but even if it had, they could have been eliminated).

What's the -> operator in Prolog and how can I use it?

I've read about it in a book but it wasn't explained at all. I also never saw it in a program. Is part of Prolog syntax? What's it for? Do you use it?
It represents implication. The righthand side is only executed if the lefthand side is true. Thus, if you have this code,
implication(X) :-
(X = a ->
write('Argument a received.'), nl
; X = b ->
write('Argument b received.'), nl
;
write('Received unknown argument.'), nl
).
Then it will write different things depending on it argument:
?- implication(a).
Argument a received.
true.
?- implication(b).
Argument b received.
true.
?- implication(c).
Received unknown argument.
true.
(link to documentation.)
It's a local version of the cut, see for example the section on control predicated in the SWI manual.
It is mostly used to implement if-then-else by (condition -> true-branch ; false-branch). Once the condition succeeds there is no backtracking from the true branch back into the condition or into the false branch, but backtracking out of the if-then-else is still possible:
?- member(X,[1,2,3]), (X=1 -> Y=a ; X=2 -> Y=b ; Y=c).
X = 1,
Y = a ;
X = 2,
Y = b ;
X = 3,
Y = c.
?- member(X,[1,2,3]), (X=1, !, Y=a ; X=2 -> Y=b ; Y=c).
X = 1,
Y = a.
Therefore it is called a local cut.
It is possible to avoid using it by writing something more wordy. If I rewrite Stephan's predicate:
implication(X) :-
(
X = a,
write('Argument a received.'), nl
;
X = b,
write('Argument b received.'), nl
;
X \= a,
X \= b,
write('Received unknown argument.'), nl
).
(Yeah I don't think there is any problem with using it, but my boss was paranoid about it for some reason, so we always used the above approach.)
With either version, you need to be careful that you are covering all cases you intend to cover, especially if you have many branches.
ETA: I am not sure if this is completely equivalent to Stephan's, because of backtracking if you have implication(X). But I don't have a Prolog interpreter right now to check.