Formal Languages - Grammar - grammar

I am taking a Formal Languages and Computability class and am having a little trouble understanding the concept of grammar. One of my assignment questions is this:
Take ∑ = {a,b}, and let na(w) and nb(w) denote the number of a's and b's in the string w, respectively. Then the grammar G with productions:
S -> SS
S -> λ
S -> aSb
S -> bSa
generates the language L = {w: na(w) = nb(w)}.
1) The language in the example contains an empty string. Modify the given grammar so that it generates L - {λ}.
I am thinking that I should modify the condition of L, something like:
L = {w: na(w) = nb(w), na, nb > 0}
That way, we indicate that the string is never empty.
2) Modify the grammar in the example so that it will generate L ∪ {anbn+1: n >= 0}.
I am not sure on how to do this one. Should that mean I make one more condition in the grammar, adding something like S -> aSbb?
Any explanation about these two questions would be greatly appreciated. I'm still trying to figure these grammar stuff out so I am not sure about my answers.

1) The question is about modifying the grammar to obtain a new language; so don't modify directly the language…
Your grammar generates the empty word because of the production:
S -> λ
So you could think of removing this production altogether. This yields the following grammar:
S -> SS
S -> aSb
S -> bSa
Unfortunately, this grammar doesn't generate a language (a bit like in induction, it misses an initial: there are no productions that only consist of terminals). To fix this, add the following productions:
S -> ab
S -> ba
2) Don't randomly try to add production rules in the hope that it's going to work. Here you want a's followed by b's. So the production rule
S -> bSa
must certainly disappear. Also, the rule
S -> SS
would produce, e.g., abab (try to see how this is obtained). So we'll have to remove it too. We're left with:
S -> λ
S -> aSb
Now this grammar generates:
λ
ab
aabb
aaabbb
etc. That's not bad at all! To get an extra trailing b, we could create a new non-terminal, say T, replace our current S by T, and add that trailing b in S:
T -> λ
T -> aTb
S -> Tb
I know that this is homework; I gave you the solutions to your homework: that's because, from the way you asked your question, it seems you're completely lost. I hope this answer will help you get on the right path!

Related

Definition of First and Follow sets of the right-hand sides of production

I am learning about LL(1) grammars. I have a task of checking if grammar is LL(1) and if not, I then need to find the rules, which prevent it from being LL(1). I came across this link https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/Syntax.html/node14.html which has a theorem which can be used as a criteria for deciding if grammar is LL(1) or not. It says that for any rule A -> alpha | beta some equalities, considering FIRST and FOLLOW sets need to be true. Therefore, I need to find FIRST and FOLLOW sets of these right-hand sides of production.
Let's say, I have following rules A -> a b B S | eps. How do I calculate FIRST and FOLLOW of a b B S? As far as I understand by definition these sets are defined only for 1 non-terminal symbol.
The idea behind the FIRST function is that it returns the set of terminals which could possibly start the expansion of its argument. It's usual to also add the special object ε (which is a way of writing an empty sequence of symbols) if ε is a possible expansion.
So if a is a terminal, FIRST(a) is just { a }. And if A is a non-terminal, FIRST(A) is the set of non-terminals which could possibly appear at the beginning of a derivation of A. Finally, FIRST(ε) must be { ε }, according to the convention described above.
Now suppose α is a (possibly empty) sequence of grammar symbols:
If α is empty (that is, it's ε), FIRST(α) is { ε }
If the first symbol in α is the terminal a, FIRST(α) is { a }.
If the first symbol in α is the non-terminal A, there are two possibilities. Let TAIL(α) be the rest of α after the first symbol. Now:
if ε ∈ FIRST(A), then FIRST(α) is FIRST(A) ∪ FIRST(TAIL(α)).
otherwise, FIRST(α) is FIRST(A).
Now, how do we compute FIRST(A), for every non-terminal A? Using the above definition of FIRST(α), we recursively define FIRST(A) to be the union of the sets FIRST(α) for every α which is the right-hand side of a production A → α.
The FOLLOW function defines the set of terminal symbols which might appear after the expansion of a non-terminal. It is only defined on non-terminals; if you look carefully at the LL(1) conditions on the page you cite, you'll see that FIRST is applied to a right-hand side, while FOLLOW is only applied to left-hand sides.

Computation of follow set

To compute FOLLOW(A) for all non-terminals A, apply the following rules
until nothing can be added to any FOLLOW set.
Place $ in FOLLOW(S) , where S is the start symbol, and $ is the input
right endmarker .
If there is a production A -> B, then everything in FIRST(b) except epsilon
is in FOLLOW(B) .
If there is a production A -> aBb, or a production A -> aBb, where
FIRST(b) contains t, then everything in FOLLOW(A) is in FOLLOW(B).
a,b is actually alpha and beta(sentential form). This is from dragon book.
Now my question is in this case can we take a=epsilon ?
and can b(beta) be 2 non-terminals like XY? (if senetntial then it solud be..)
Here's what the Dragon book actually says: [See note 1]
Place $ in FOLLOW(S).
For every production A→αBβ, place everything
in FIRST(β) except ε into
FOLLOW(B)
For every production A→αB or
A→αBβ where FIRST(β) contains
ε, place FOLLOW(A) into
FOLLOW(B).
There is a section earlier in the book on "notational conventions" in which it is made clear that a lower-case greek letter like α or β represents a possibly empty string of grammar symbols. So, yes, α could be empty and β could be two nonterminals (or any other string of grammar symbols).
Note:
Here I'm using a variant on the formatting suggesting made by #leftroundabout in this meta post. (The only difference is that I put the formulae in bold.) It's easy to type Greek letters as entities if you don't have a Greek keyboard handy; just use, for example, α (α) or β (β). For upper-case Greek letters, write the name with an upper-case letter: Σ (Σ). Other useful symbols are arrows: → (→) and ⇒ (⇒).

is this regular grammar- S -> 0S0/00?

Let L denotes the language generated by the grammar S -> 0S0/00. Which of the following is true?
(A) L = 0+
(B) L is regular but not 0+
(C) L is context free but not regular
(D) L is not context free
HI can anyone explain me how the language represented by the grammar S -> 0S0/00 is regular? I know very well the grammar is context free but not sure how can that be regular?
If you mean the language generated by the grammar
S -> 0S0
S -> 00
then it should be clear that it is the same language as is generated by
S -> 00S
S -> 00
which is a left regular grammar, and consequently generates a regular language. (Some people would say that a left regular grammar can only have a single terminal in each production, but it is trivial to create a chain of aN productions to produce the same effect.)
It should also be clear that the above differs from
S -> 0S
S -> S
We know that a language is regular if there exists a DFA (deterministic finite automata) that recogognizes it, or a RE (Regular expression). Either way we can see here that your grammar generates word like : 00, 0000, 000000, 00000000.. etc so it's words that starts and ends with '0' and with an even number of zeroes greater or equal than length two.
Here's a DFA for this grammar
Also here is a RE (Regular expression) that recognizes the language :
(0)(00)*(0)
Therefore you know this language recognized by this grammar is regular.
(Sorry if terms aren't 100% accurate, i took this class in french so terms might differ a bit) let me know if you have any other questions!
Consider first the definition of a regular grammar here
https://www.cs.montana.edu/ross/theory/contents/chapter02/green/section05/page04.xhtml
So first we need a set N of non terminal symbols (symbols that can be rewritten as a combination of terminal and non-terminal symbols), for our example N={S}
Next we need a set T of terminal symbols (symbols that cannot be replaced), for our example T={0}
Now a set P of grammer rules that fit a very specific form (see link), for L we see that P={S->0S0,S->00}. Both of these rules are of regular form (meaning each non-terminal can be replaced with a terminal, a terminal then a non-terminal, or the empty string, see link for more info). So we have our rules.
Now we just need a starting symbol X, we can trivally say that our starting symbol is S.
Therefore the tuple (N={S},T={0},P={S->0S0,S->00},X=S) fits the requirements to be defined a regular grammar.
We don't need the machinery of regular grammars to answer your question. Just note the possible derivations all look like this:
S -> (0 S 0) -> 0 (0 S 0) 0 -> 0 0 (0 S 0) 0 0 -> ... -> 0...0 (0 0) 0...0
\_ _/ \_ _/
k k
Here I've added parens ( ) to show the result of the previous expansion of S. These aren't part of the derived string. I.e. we substitute S with 0 S 0 k >= 0 times followed by a single substitution with 00.
From this is should be easy to see L is the set of strings of 0's of length 2k + 2 for some integer k >= 0. A shorthand notation for this is
L = { 02m | m >= 1 }
In words: The set of all even length strings of zeros excluding the empty string.
To prove L is regular, all we need is a regular expression for L. This is easy: (00)+. Or if you prefer, 00(00)*.
You might be confused because a small change to the grammar makes its language context free but not regular:
S -> 0S1/01
This is the more complex language { 0m 1m | m >= 1 }. It's straightforward to show this isn't a regular language using the Pumping Lemma.

regular/context free grammar

Im hoping someone can help me understand a question I have, its not homework, its just an example question I am trying to work out. The problem is to define a grammar that generates all the sums of any number of operands. For example, 54 + 3 + 78 + 2 + 5... etc. The way that I found most easy to define the problem is:
non-terminal {S,B}
terminal {0..9,+,epsilon}
Rules:
S -> [0..9]S
S -> + B
B -> [0..9]B
B -> + S
S -> epsilon
B -> epsilon
epsilon is an empty string.
This seems like it should be correct to me as you could define the first number recursively with the first rule, then to add the next integer, you could use the second rule and then define the second integer using the third rule. You could then use the fourth rule to go back to S and define as many integers as you need.
This solution seems to me to be a regular grammar as it obeys the rule A -> aB or A -> a but in the notes it says for this question that it is no possible to define this problem using a regular grammar. Can anyone please explain to me why my attempt is wrong and why this needs to be context free?
Thanks.
Although it's not the correct definition, it's easier to think that for a language to be non-regular it would need to balance something (like parenthesis).
Your problem can be solved using direct recursion only on the sides of the rules, not in the middle, so it can be solved using a regular language. (Again, this is not the correct definition, but it's easier to remember!)
For example, for a regular expression engine (like in Perl or JavaScript) one could easily write /(\d+)(\+(\d+))*/.
You could write it this way:
non-terminal {S,R,N,N'}
terminal {0..9,+,epsilon}
Rules:
S -> N R
S -> epsilon
N -> [0..9] N'
N' -> N
N' -> epsilon
R -> + N R
R -> epsilon
Which should work correctly.
The language is regular. A regular expression would be:
((0|1|2|...|9)*(0|1|2|...|9)+)*(0|1|2|...|9)*(0|1|2|...|9)
Terminals are: {0,1,2,...,9,+}
"|" means union and * stands for Star closure
If you need to have "(" and ")" in your language, then it will not be regular as it needs to match parentheses.
A sample context free grammar would be:
E->E+E
E->(E)
E->F
F-> 0F | 1F | 2F | ... | 9F | 0 | 1 | ... | 9

Defining a language in EBNF

Give the EBNF specification for the language L that is made up of the chars a, b and c such that sentences in the language have the form
L : sqsR
-s is a string of any combination of the characters a and b
-sR is that same string s reversed
-q is an odd number of c's followed by either an odd number of b's
or an even number of a’s.
What I have so far:
L -> S
S -> {a}{b}Q
Q ->
If this is right, I'm still not really sure how to produce from Q and also how to represent S in reverse.
This is a string that starts and ends with the same string, but reversed:
X -> aXa
-> bXb
This is a string with an odd number of c's:
Y -> cY2
Y2 -> ccY2
I've left out some crucial bits, but hopefully this can get you started.
Try building the first two parts from the middle out
You can force an odd number of repetitions by starting with exactly one item and adding N*2 additional items (for integer N). This should suggest how to force an even number as well