Give context-free grammars that generate the following language.
In all parts the alphabet ∑ is {x,s}.
{w| w starts and ends with the different symbols}
S -> xAs | sAx
A -> xA | sA | xAs | sAx | e
e = epsilon
need help with a non-extended BNF grammar:
Σ = {a,b,c}
L = {ω ɛ Σ^* | such that all a's (if any) comes before all c's(if any)}
For example, the strings aba, cbc, and abacbc are in the language, but string abcabc is not.
This is what i have so far (is it correct ? please correct me if i am wrong):
s->asbsc|bsasc|ascsb|ɛ
Your comment says you want equal numbers of a and c, so start with the simple grammar that does that:
S -> aSc | ε
and add in any number of b's before/after/between those:
S -> BaScB | B
B -> Bb | ε
note that the above is not ambiguous (it's even LR(1)).
If you want to allow a different number of a's and c's, you can use the same approach to avoid ambiguity. Start with just the a's and c's:
S -> AC
A -> Aa | ε
C -> Cc | ε
and add in b's at the beginning and after each other character:
S -> BAC
A -> AaB | ε
C -> CcB | ε
B -> Bb | ε
Do the number of a's and c's need to be the same? If, not then you are missing those cases where they differ, such as: aac. I think something like this should work:
S -> AC
A -> aA | bA | ε
C -> bC | cC | ε
The A production is used for deriving a sequence of characters that are not a c and the C production is used for deriving a sequence of characters that are not an a.
I am given the language {w ∈ {a,b}∗| |w|a = |w|b + 1}. and am asked to find a grammar.
I have come up with the following:
S->aSb | bSa | aAa | bBb | a
A->bS
B->?
and was wondering if this was correct, or if not why?
It's not correct, because it cannot generate the valid sentence:
baaab
which has one more a than b. It should be obvious that this sentence cannot be generated because every sentence generated by your language has different start and end characters.
Edit The edited question is also not correct because the productions:
S -> ... | aAa | a | ...
A -> bS
is equivalent to (by substituting the RHS of A for its use in S):
S -> ... | abSa | a | ...
which will match as follows:
S -> abSa -> abaa
I need help with constructing a left-linear and right-linear grammar for the languages below?
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
Is this correct? I need help with b & c.
Constructing an equivalent Regular Grammar from a Regular Expression
First, I start with some simple rules to construct Regular Grammar(RG) from Regular Expression(RE).
I am writing rules for Right Linear Grammar (leaving as an exercise to write similar rules for Left Linear Grammar)
NOTE: Capital letters are used for variables, and small for terminals in grammar. NULL symbol is ^. Term 'any number' means zero or more times that is * star closure.
[BASIC IDEA]
SINGLE TERMINAL: If the RE is simply e (e being any terminal), we can write G, with only one production rule S --> e (where S is the start symbol), is an equivalent RG.
UNION OPERATION: If the RE is of the form e + f, where both e and f are terminals, we can write G, with two production rules S --> e | f, is an equivalent RG.
CONCATENATION: If the RE is of the form ef, where both e and f are terminals, we can write G, with two production rules S --> eA, A --> f, is an equivalent RG.
STAR CLOSURE: If the RE is of the form e*, where e is a terminal and * Kleene star closure operation, we can write two production rules in G, S --> eS | ^, is an equivalent RG.
PLUS CLOSURE: If the RE is of the form e+, where e is a terminal and + Kleene plus closure operation, we can write two production rules in G, S --> eS | e, is an equivalent RG.
STAR CLOSURE ON UNION: If the RE is of the form (e + f)*, where both e and f are terminals, we can write three production rules in G, S --> eS | fS | ^, is an equivalent RG.
PLUS CLOSURE ON UNION: If the RE is of the form (e + f)+, where both e and f are terminals, we can write four production rules in G, S --> eS | fS | e | f, is an equivalent RG.
STAR CLOSURE ON CONCATENATION: If the RE is of the form (ef)*, where both e and f are terminals, we can write three production rules in G, S --> eA | ^, A --> fS, is an equivalent RG.
PLUS CLOSURE ON CONCATENATION: If the RE is of the form (ef)+, where both e and f are terminals, we can write three production rules in G, S --> eA, A --> fS | f, is an equivalent RG.
Be sure that you understands all above rules, here is the summary table:
+-------------------------------+--------------------+------------------------+
| TYPE | REGULAR-EXPRESSION | RIGHT-LINEAR-GRAMMAR |
+-------------------------------+--------------------+------------------------+
| SINGLE TERMINAL | e | S --> e |
| UNION OPERATION | e + f | S --> e | f |
| CONCATENATION | ef | S --> eA, A --> f |
| STAR CLOSURE | e* | S --> eS | ^ |
| PLUS CLOSURE | e+ | S --> eS | e |
| STAR CLOSURE ON UNION | (e + f)* | S --> eS | fS | ^ |
| PLUS CLOSURE ON UNION | (e + f)+ | S --> eS | fS | e | f |
| STAR CLOSURE ON CONCATENATION | (ef)* | S --> eA | ^, A --> fS |
| PLUS CLOSURE ON CONCATENATION | (ef)+ | S --> eA, A --> fS | f |
+-------------------------------+--------------------+------------------------+
note: symbol e and f are terminals, ^ is NULL symbol, and S is the start variable
[ANSWER]
Now, we can come to you problem.
a) (0+1)*00(0+1)*
Language description: All the strings consist of 0s and 1s, containing at-least one pair of 00.
Right Linear Grammar:
S --> 0S | 1S | 00A
A --> 0A | 1A | ^
String can start with any string of 0s and 1s thats why included rules s --> 0S | 1S and Because at-least one pair of 00 ,there is no null symbol. S --> 00A is included because 0, 1 can be after 00. The symbol A takes care of the 0's and 1's after the 00.
Left Linear Grammar:
S --> S0 | S1 | A00
A --> A0 | A1 | ^
b) 0*(1(0+1))*
Language description: Any number of 0, followed any number of 10 and 11.
{ because 1(0 + 1) = 10 + 11 }
Right Linear Grammar:
S --> 0S | A | ^
A --> 1B
B --> 0A | 1A | 0 | 1
String starts with any number of 0 so rule S --> 0S | ^ are included, then rule for generating 10 and 11 for any number of times using A --> 1B and B --> 0A | 1A | 0 | 1.
Other alternative right linear grammar can be
S --> 0S | A | ^
A --> 10A | 11A | 10 | 11
Left Linear Grammar:
S --> A | ^
A --> A10 | A11 | B
B --> B0 | 0
An alternative form can be
S --> S10 | S11 | B | ^
B --> B0 | 0
c) (((01+10)*11)*00)*
Language description: First is language contains null(^) string because there a * (star) on outside of every thing present inside (). Also if a string in language is not null that defiantly ends with 00. One can simply think this regular expression in the form of ( ( (A)* B )* C )* , where (A)* is (01 + 10)* that is any number of repeat of 01 and 10.
If there is a instance of A in string there would be a B defiantly because (A)*B and B is 11.
Some example strings { ^, 00, 0000, 000000, 1100, 111100, 1100111100, 011100, 101100, 01110000, 01101100, 0101011010101100, 101001110001101100 ....}
Left Linear Grammar:
S --> A00 | ^
A --> B11 | S
B --> B01 | B10 | A
S --> A00 | ^ because any string is either null, or if it's not null it ends with a 00. When the string ends with 00, the variable A matches the pattern ((01 + 10)* + 11)*. Again this pattern can either be null or must end with 11. If its null, then A matches it with S again i.e the string ends with pattern like (00)*. If the pattern is not null, B matches with (01 + 10)*. When B matches all it can, A starts matching the string again. This closes the out-most * in ((01 + 10)* + 11)*.
Right Linear Grammar:
S --> A | 00S | ^
A --> 01A | 10A | 11S
Second part of you question:
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
(answer)
You solution are wrong for following reasons,
Left-linear grammar is wrong Because string 0010 not possible to generate.
Right-linear grammar is wrong Because string 1000 is not possible to generate. Although both are in language generated by regular expression of question (a).
EDIT
Adding DFA's for each regular expression. so that one can find it helpful.
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
Drawing DFA for this regular expression is trick and complex.
For this I wanted to add DFA's
To simplify the task, we should think the kind formation of RE
to me the RE (((01+10)*11)*00)* looks like (a*b)*
(((01+10)*11)* 00 )*
( a* b )*
Actually in above expression a it self in the form of (a*b)*
that is ((01+10)*11)*
RE (a*b)* is equals to (a + b)*b + ^. The DFA for (ab) is as belows:
DFA for ((01+10)*11)* is:
DFA for (((01+10)*11)* 00 )* is:
Try to find similarity in construction of above three DFA. don't move ahead till you don't understand first one
Rules to convert regular expressions to left or right linear regular grammar
This is a follow up question from Grammar: difference between a top down and bottom up?
I understand from that question that:
the grammar itself isn't top-down or bottom-up, the parser is
there are grammars that can be parsed by one but not the other
(thanks Jerry Coffin
So for this grammar (all possible mathematical formulas):
E -> E T E
E -> (E)
E -> D
T -> + | - | * | /
D -> 0
D -> L G
G -> G G
G -> 0 | L
L -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Would this be readable by a top down and bottom up parser?
Could you say that this is a top down grammar or a bottom up grammar (or neither)?
I am asking because I have a homework question that asks:
"Write top-down and bottom-up grammars for the language consisting of all ..." (different question)
I am not sure if this can be correct since it appears that there is no such thing as a top-down and bottom-up grammar. Could anyone clarify?
That grammar is stupid, since it unites lexing and parsing as one. But ok, it's an academic example.
The thing with bottoms-up and top-down is that is has special corner cases that are difficult to implement with you normal 1 look ahead. I probably think that you should check if it has any problems and change the grammar.
To understand you grammar I wrote a proper EBNF
expr:
expr op expr |
'(' expr ')' |
number;
op:
'+' |
'-' |
'*' |
'/';
number:
'0' |
digit digits;
digits:
'0' |
digit |
digits digits;
digit:
'1' |
'2' |
'3' |
'4' |
'5' |
'6' |
'7' |
'8' |
'9';
I especially don't like the rule digits: digits digits. It is unclear where the first digits starts and the second ends. I would implement the rule as
digits:
'0' |
digit |
digits digit;
An other problem is number: '0' | digit digits; This conflicts with digits: '0' and digits: digit;. As a matter of fact that is duplicated. I would change the rules to (removing digits):
number:
'0' |
digit |
digit zero_digits;
zero_digits:
zero_digit |
zero_digits zero_digit;
zero_digit:
'0' |
digit;
This makes the grammar LR1 (left recursive with one look ahead) and context free. This is what you would normally give to a parser generator such as bison. And since bison is bottoms up, this is a valid input for a bottoms-up parser.
For a top-down approach, at least for recursive decent, left recursive is a bit of a problem. You can use roll back, if you like but for these you want a RR1 (right recursive one look ahead) grammar. To do that swap the recursions:
zero_digits:
zero_digit |
zero_digit zero_digits;
I am not sure if that answers you question. I think the question is badly formulated and misleading; and I write parsers for a living...