How can I derivate these Grammar - grammar

G = (V={S,X,Y}, T={0,1,2},P,S)
S -> 0X1
X ->S | 00S2 | Y | ε
Y ->X | 1
The Problem is I don´t know how to derivate numbers..
How can I derivate this here:
00111 ∈ L(G)
And here I have to give a derivation three:
0000121 ∈ L(G)

To do a derivation you start with the start symbol (in this case S) which is the fourth item in the grammar tuple). You then apply the production rules (P) in whatever order seems appropriate to you.
A production like:
X → S | 00S2 | Y | ε
means that you can replace an X with
S, or
00S2, or
Y, or
nothing.
In other words, you read production rules as follows:
→ means "can be replaced with".
| means "or"
ε means "nothing" (Replacing a symbol with nothing means deleting it from the current string.)
Everything else is just a possible symbol in the string. You keep doing replacements, one at a time, until you reach the string you are trying to derive.
Here's a quick example:
S
→ 0X1 (using S → 0X1)
→ 000S21 (using X → 00S2)
→ 0000X121 (using S → 0X1)
→ 0000121 (using X → ε)
That's it. Nothing complicated at all. Just a bunch of search and replace operations. (You don't need to replace the first occurrence, if there is more than one possibility. You can do the replacements in any order you like. But it's convenient to be systematic.)

Related

Definition of First and Follow sets of the right-hand sides of production

I am learning about LL(1) grammars. I have a task of checking if grammar is LL(1) and if not, I then need to find the rules, which prevent it from being LL(1). I came across this link https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/Syntax.html/node14.html which has a theorem which can be used as a criteria for deciding if grammar is LL(1) or not. It says that for any rule A -> alpha | beta some equalities, considering FIRST and FOLLOW sets need to be true. Therefore, I need to find FIRST and FOLLOW sets of these right-hand sides of production.
Let's say, I have following rules A -> a b B S | eps. How do I calculate FIRST and FOLLOW of a b B S? As far as I understand by definition these sets are defined only for 1 non-terminal symbol.
The idea behind the FIRST function is that it returns the set of terminals which could possibly start the expansion of its argument. It's usual to also add the special object ε (which is a way of writing an empty sequence of symbols) if ε is a possible expansion.
So if a is a terminal, FIRST(a) is just { a }. And if A is a non-terminal, FIRST(A) is the set of non-terminals which could possibly appear at the beginning of a derivation of A. Finally, FIRST(ε) must be { ε }, according to the convention described above.
Now suppose α is a (possibly empty) sequence of grammar symbols:
If α is empty (that is, it's ε), FIRST(α) is { ε }
If the first symbol in α is the terminal a, FIRST(α) is { a }.
If the first symbol in α is the non-terminal A, there are two possibilities. Let TAIL(α) be the rest of α after the first symbol. Now:
if ε ∈ FIRST(A), then FIRST(α) is FIRST(A) ∪ FIRST(TAIL(α)).
otherwise, FIRST(α) is FIRST(A).
Now, how do we compute FIRST(A), for every non-terminal A? Using the above definition of FIRST(α), we recursively define FIRST(A) to be the union of the sets FIRST(α) for every α which is the right-hand side of a production A → α.
The FOLLOW function defines the set of terminal symbols which might appear after the expansion of a non-terminal. It is only defined on non-terminals; if you look carefully at the LL(1) conditions on the page you cite, you'll see that FIRST is applied to a right-hand side, while FOLLOW is only applied to left-hand sides.

Computation of follow set

To compute FOLLOW(A) for all non-terminals A, apply the following rules
until nothing can be added to any FOLLOW set.
Place $ in FOLLOW(S) , where S is the start symbol, and $ is the input
right endmarker .
If there is a production A -> B, then everything in FIRST(b) except epsilon
is in FOLLOW(B) .
If there is a production A -> aBb, or a production A -> aBb, where
FIRST(b) contains t, then everything in FOLLOW(A) is in FOLLOW(B).
a,b is actually alpha and beta(sentential form). This is from dragon book.
Now my question is in this case can we take a=epsilon ?
and can b(beta) be 2 non-terminals like XY? (if senetntial then it solud be..)
Here's what the Dragon book actually says: [See note 1]
Place $ in FOLLOW(S).
For every production A→αBβ, place everything
in FIRST(β) except ε into
FOLLOW(B)
For every production A→αB or
A→αBβ where FIRST(β) contains
ε, place FOLLOW(A) into
FOLLOW(B).
There is a section earlier in the book on "notational conventions" in which it is made clear that a lower-case greek letter like α or β represents a possibly empty string of grammar symbols. So, yes, α could be empty and β could be two nonterminals (or any other string of grammar symbols).
Note:
Here I'm using a variant on the formatting suggesting made by #leftroundabout in this meta post. (The only difference is that I put the formulae in bold.) It's easy to type Greek letters as entities if you don't have a Greek keyboard handy; just use, for example, α (α) or β (β). For upper-case Greek letters, write the name with an upper-case letter: Σ (Σ). Other useful symbols are arrows: → (→) and ⇒ (⇒).

regular/context free grammar

Im hoping someone can help me understand a question I have, its not homework, its just an example question I am trying to work out. The problem is to define a grammar that generates all the sums of any number of operands. For example, 54 + 3 + 78 + 2 + 5... etc. The way that I found most easy to define the problem is:
non-terminal {S,B}
terminal {0..9,+,epsilon}
Rules:
S -> [0..9]S
S -> + B
B -> [0..9]B
B -> + S
S -> epsilon
B -> epsilon
epsilon is an empty string.
This seems like it should be correct to me as you could define the first number recursively with the first rule, then to add the next integer, you could use the second rule and then define the second integer using the third rule. You could then use the fourth rule to go back to S and define as many integers as you need.
This solution seems to me to be a regular grammar as it obeys the rule A -> aB or A -> a but in the notes it says for this question that it is no possible to define this problem using a regular grammar. Can anyone please explain to me why my attempt is wrong and why this needs to be context free?
Thanks.
Although it's not the correct definition, it's easier to think that for a language to be non-regular it would need to balance something (like parenthesis).
Your problem can be solved using direct recursion only on the sides of the rules, not in the middle, so it can be solved using a regular language. (Again, this is not the correct definition, but it's easier to remember!)
For example, for a regular expression engine (like in Perl or JavaScript) one could easily write /(\d+)(\+(\d+))*/.
You could write it this way:
non-terminal {S,R,N,N'}
terminal {0..9,+,epsilon}
Rules:
S -> N R
S -> epsilon
N -> [0..9] N'
N' -> N
N' -> epsilon
R -> + N R
R -> epsilon
Which should work correctly.
The language is regular. A regular expression would be:
((0|1|2|...|9)*(0|1|2|...|9)+)*(0|1|2|...|9)*(0|1|2|...|9)
Terminals are: {0,1,2,...,9,+}
"|" means union and * stands for Star closure
If you need to have "(" and ")" in your language, then it will not be regular as it needs to match parentheses.
A sample context free grammar would be:
E->E+E
E->(E)
E->F
F-> 0F | 1F | 2F | ... | 9F | 0 | 1 | ... | 9

Kleene Closure in Chomsky Normal Form

Let n be any terminal.
Consider the following, presumably correct, representation of the kleene star over n:
N → n N | ε
(where ε is the empty terminal.)
Wikipedia says:
Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form.
I cannot see how the above grammar could be transformed to CNF.
Is the grammar not context-free?
Is there in fact a way to represent it in CNF?
Fortunately, this can be written in CNF. Here is one such grammar:
S → ε | n | NA
N → n
A → n | NA
Therefore, the language is context-free.
Hope this helps!

Specifying language for a given grammar

DFA problem : Write a complete grammar for L, including the quadruple and the production rules
L ={x: ∃y ∈ {a, b}* : x = ay}
Answer:
G={{S, A}, {a, b}, S, P}
P: S => aA
A => aA | bA | λ
My question is :
Why there is λ for A, but there is no λ for S?
From the language definition, it is any string that begins with an a and contains only a's and b's , but why in the answer A => bA. Does not it mean that the string starts with b if it is A => bA?
Thank you so much
1. Why there is λ for A, but there is no λ for S?
λ nul can be derived from A to convert a sentimental from into sentence. Additionally according to language statement prefix sub-string y ∈ {a, b}* can be nul (a empty string) e.g. "a" is a string belongs to the language. If y contain any symbol then length of language will be more than one.
S doesn't derive λ nul because empty (or say nul string) is not in language. The smallest string in language is single "a".
2. From the language definition, it is any string that begins with an a and contains only a's and b's , but why in the answer A => bA. Does not it mean that the string starts with b if it is A => bA?
Note only strings those can derived from start variable S are included in language of grammar. You can't start derivation from A (that is not start variable). And if you start a derivation from S your string will always start with a symbol.
I suggest you to read: "Why the need for terminals? Is my solution sufficient enough?" Where I written about basic definition of formal grammar.