BNF grammar associativity - grammar

I'm trying to understand how left and right associative grammars work and I need a little help. So I decided to come up an example and ask for some clarification.
Basically, I want to create a grammar for two logical operations: and + implication. I want to make it so and is left associative and implication is right associative. This is what I got so far. Is this correct? I feel like it might be ambiguous. (I also kept in mind that and has higher precedence than implication)
<exp> := <and>
<and> := <impl> | <and> ^ <impl>
<impl> := <term> | <term> -> <impl>
<term> := (<exp>) | <bool>
<bool> := true | false

From my limited knowledge, it seems to me that you got the precedences inverted.
At the grammar level, a left associative operator has the following format:
exp = exp op other | other
...and a right associative operator would have the following format:
exp = other op exp | other
As you can see, it depends on your use of recursion: left associativity would use a left recursive rule while right associativity would use a right recursive one.
As for precedence, the later a rule is in the grammar, the higher its precedence. In the grammar bellow, where opL represents a left-associative operator and opR represents a right associative one, exp0 has lower precedence than exp1, which has lower precendence than other:
exp0 = exp0 opL exp1 | exp1
exp1 = other opR exp1 | other
other = ...
As an example, if opL is "+" and opR is "**" and other is a letter, see how the parse tree for a few expressions would be built:
Left associativity:
a + b + c -> (a + b) + c
exp0 -+-> exp0 +-> exp0 --> exp1 --> other --> a
| |
| +-> opL --> "+"
| |
| \-> exp1 --> other --> b
|
+-> opL --> "+"
|
\-> exp1 --> c
Right Associativity:
a ** b ** c -> a ** (b ** c)
exp0 --> exp1 +-> other --> a
|
+-> opR --> "**"
|
\-> exp1 +-> other --> b
|
+-> opR --> "**"
|
\-> exp1 --> other --> c
Precedence:
a + b ** c -> a + (b ** c)
exp0 +-> exp0 +-> exp1 --> other --> a
|
+-> opL --> "+"
|
\-> exp1 +-> other --> b
|
+-> opR --> "**"
|
\-> exp1 --> other --> c

Related

Give context-free grammars that generate the following language

Give context-free grammars that generate the following language.
In all parts the alphabet ∑ is {x,s}.
{w| w starts and ends with the different symbols}
S -> xAs | sAx
A -> xA | sA | xAs | sAx | e
e = epsilon

Context Free Grammar BNF

need help with a non-extended BNF grammar:
Σ = {a,b,c}
L = {ω ɛ Σ^* | such that all a's (if any) comes before all c's(if any)}
For example, the strings aba, cbc, and abacbc are in the language, but string abcabc is not.
This is what i have so far (is it correct ? please correct me if i am wrong):
s->asbsc|bsasc|ascsb|ɛ
Your comment says you want equal numbers of a and c, so start with the simple grammar that does that:
S -> aSc | ε
and add in any number of b's before/after/between those:
S -> BaScB | B
B -> Bb | ε
note that the above is not ambiguous (it's even LR(1)).
If you want to allow a different number of a's and c's, you can use the same approach to avoid ambiguity. Start with just the a's and c's:
S -> AC
A -> Aa | ε
C -> Cc | ε
and add in b's at the beginning and after each other character:
S -> BAC
A -> AaB | ε
C -> CcB | ε
B -> Bb | ε
Do the number of a's and c's need to be the same? If, not then you are missing those cases where they differ, such as: aac. I think something like this should work:
S -> AC
A -> aA | bA | ε
C -> bC | cC | ε
The A production is used for deriving a sequence of characters that are not a c and the C production is used for deriving a sequence of characters that are not an a.

Tips for creating "Context Free Grammar"

I am new to CFG's,
Can someone give me tips in creating CFG that generates some language
For example
L = {am bn | m >= n}
What I got is:
So -> a | aSo | aS1 | e
S1 -> b | bS1 | e
but I think this area is wrong, because there is a chance that the number of b's can be greater than a's.
How to write CFG with example ambn
L = {am bn | m >= n}.
Language description: am bn consist of a followed by b where number of a are equal or more then number of b.
some example strings: {^, a, aa, aab, aabb, aaaab, ab......}
So there is always one a for one b but extra a are possible. infect string can be consist of a only. Also notice ^ null is a member of language because in ^ NumberOf(a) = NumberOf(b) = 0
How to write a grammar that accepts the language formed by strings am bn?
In the grammar, there should be rules such that if you add a b symbol you also add a a symbol.
and this can be done with something like:
S --> aSb
But this is incomplete because we need a rule to generate extra as:
A --> aA | a
Combine two production rules into a single grammar CFG.
S --> aSb | A
A --> aA | a
So you can generate any string that consist of a also a and b in (am bn) pattern.
But in above grammar there is no way to generate ^ string.
So, change this grammar like this:
S --> B | ^
B --> aBb | A
A --> aA | a
this grammar can generate {am bn | m >= n} language.
Note: to generate ^ null string, I added an extra first step in grammar by adding S--> B | ^, So you can either add ^ or your string of symbol a and b. (now B plays role of S from previous grammar to generate equal numbers of a and b)
Edit: Thanks to #Andy Hayden
You can also write equivalent grammar for same language {am bn | m >= n}:
S --> aSb | A
A --> aA | ^
notice: here A --> aA | ^ can generate zero or any number of a. And that should be preferable to my grammar because it generates a smaller parse tree for the same string.
(smaller in height preferable because of efficient parsing)
The following tips may be helpful to write Grammar for a formal language:
You are to be clear about language that what it describes (meaning/pattern).
You can remember solutions for some basic problems(the idea being that you can write new grammars).
You can write rules for fundamental languages like I have written for RE in this example to write Right-Linear-Grammmar. The rules will help you to write Grammar for New Languages.
One different approach is to first draw automata, then convert automata to Grammar. We have predefined techniques to write grammar from automata from any class of formal language.
Like a Good Programmer who learns by reading the code of others, similarly one can learn to write grammars for formal languages.
Also the grammar you have written is wrong.
you want to create a grammar for following language
L= {an bm | m>=n }
that means number of 'b' should be greater or equal then number of 'a'
or you can say that for each 'b' there could at most one 'a'. not other way around.
here is grammar for this language
S-> aSb | Sb | b | ab
in this grammar for each 'a' there is one 'b'. but b can be generated without generating any 'a'.
you can also try these languages:
L1= {an bm | m > n }
L2= {an bm | m >= 2n }
L3= {an bm | 2m >= n }
L4= {an bm | m != n }
i am giving grammar for each language.
for L1
S-> aSb | Sb | b
for L2
S-> aSbb | Sb | abb
for L3
S-> AASb | Sb | aab | ab | b
for L4
S-> S1 | S2
S1-> aS1b | S1b | b
S2-> aS2b | aS2 | a
Least variables: S -> a S b | a S | e
with less variables :
S -> a S b | a S | a b | e

Left-Linear and Right-Linear Grammars

I need help with constructing a left-linear and right-linear grammar for the languages below?
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
Is this correct? I need help with b & c.
Constructing an equivalent Regular Grammar from a Regular Expression
First, I start with some simple rules to construct Regular Grammar(RG) from Regular Expression(RE).
I am writing rules for Right Linear Grammar (leaving as an exercise to write similar rules for Left Linear Grammar)
NOTE: Capital letters are used for variables, and small for terminals in grammar. NULL symbol is ^. Term 'any number' means zero or more times that is * star closure.
[BASIC IDEA]
SINGLE TERMINAL: If the RE is simply e (e being any terminal), we can write G, with only one production rule S --> e (where S is the start symbol), is an equivalent RG.
UNION OPERATION: If the RE is of the form e + f, where both e and f are terminals, we can write G, with two production rules S --> e | f, is an equivalent RG.
CONCATENATION: If the RE is of the form ef, where both e and f are terminals, we can write G, with two production rules S --> eA, A --> f, is an equivalent RG.
STAR CLOSURE: If the RE is of the form e*, where e is a terminal and * Kleene star closure operation, we can write two production rules in G, S --> eS | ^, is an equivalent RG.
PLUS CLOSURE: If the RE is of the form e+, where e is a terminal and + Kleene plus closure operation, we can write two production rules in G, S --> eS | e, is an equivalent RG.
STAR CLOSURE ON UNION: If the RE is of the form (e + f)*, where both e and f are terminals, we can write three production rules in G, S --> eS | fS | ^, is an equivalent RG.
PLUS CLOSURE ON UNION: If the RE is of the form (e + f)+, where both e and f are terminals, we can write four production rules in G, S --> eS | fS | e | f, is an equivalent RG.
STAR CLOSURE ON CONCATENATION: If the RE is of the form (ef)*, where both e and f are terminals, we can write three production rules in G, S --> eA | ^, A --> fS, is an equivalent RG.
PLUS CLOSURE ON CONCATENATION: If the RE is of the form (ef)+, where both e and f are terminals, we can write three production rules in G, S --> eA, A --> fS | f, is an equivalent RG.
Be sure that you understands all above rules, here is the summary table:
+-------------------------------+--------------------+------------------------+
| TYPE | REGULAR-EXPRESSION | RIGHT-LINEAR-GRAMMAR |
+-------------------------------+--------------------+------------------------+
| SINGLE TERMINAL | e | S --> e |
| UNION OPERATION | e + f | S --> e | f |
| CONCATENATION | ef | S --> eA, A --> f |
| STAR CLOSURE | e* | S --> eS | ^ |
| PLUS CLOSURE | e+ | S --> eS | e |
| STAR CLOSURE ON UNION | (e + f)* | S --> eS | fS | ^ |
| PLUS CLOSURE ON UNION | (e + f)+ | S --> eS | fS | e | f |
| STAR CLOSURE ON CONCATENATION | (ef)* | S --> eA | ^, A --> fS |
| PLUS CLOSURE ON CONCATENATION | (ef)+ | S --> eA, A --> fS | f |
+-------------------------------+--------------------+------------------------+
note: symbol e and f are terminals, ^ is NULL symbol, and S is the start variable
[ANSWER]
Now, we can come to you problem.
a) (0+1)*00(0+1)*
Language description: All the strings consist of 0s and 1s, containing at-least one pair of 00.
Right Linear Grammar:
S --> 0S | 1S | 00A
A --> 0A | 1A | ^
String can start with any string of 0s and 1s thats why included rules s --> 0S | 1S and Because at-least one pair of 00 ,there is no null symbol. S --> 00A is included because 0, 1 can be after 00. The symbol A takes care of the 0's and 1's after the 00.
Left Linear Grammar:
S --> S0 | S1 | A00
A --> A0 | A1 | ^
b) 0*(1(0+1))*
Language description: Any number of 0, followed any number of 10 and 11.
{ because 1(0 + 1) = 10 + 11 }
Right Linear Grammar:
S --> 0S | A | ^
A --> 1B
B --> 0A | 1A | 0 | 1
String starts with any number of 0 so rule S --> 0S | ^ are included, then rule for generating 10 and 11 for any number of times using A --> 1B and B --> 0A | 1A | 0 | 1.
Other alternative right linear grammar can be
S --> 0S | A | ^
A --> 10A | 11A | 10 | 11
Left Linear Grammar:
S --> A | ^
A --> A10 | A11 | B
B --> B0 | 0
An alternative form can be
S --> S10 | S11 | B | ^
B --> B0 | 0
c) (((01+10)*11)*00)*
Language description: First is language contains null(^) string because there a * (star) on outside of every thing present inside (). Also if a string in language is not null that defiantly ends with 00. One can simply think this regular expression in the form of ( ( (A)* B )* C )* , where (A)* is (01 + 10)* that is any number of repeat of 01 and 10.
If there is a instance of A in string there would be a B defiantly because (A)*B and B is 11.
Some example strings { ^, 00, 0000, 000000, 1100, 111100, 1100111100, 011100, 101100, 01110000, 01101100, 0101011010101100, 101001110001101100 ....}
Left Linear Grammar:
S --> A00 | ^
A --> B11 | S
B --> B01 | B10 | A
S --> A00 | ^ because any string is either null, or if it's not null it ends with a 00. When the string ends with 00, the variable A matches the pattern ((01 + 10)* + 11)*. Again this pattern can either be null or must end with 11. If its null, then A matches it with S again i.e the string ends with pattern like (00)*. If the pattern is not null, B matches with (01 + 10)*. When B matches all it can, A starts matching the string again. This closes the out-most * in ((01 + 10)* + 11)*.
Right Linear Grammar:
S --> A | 00S | ^
A --> 01A | 10A | 11S
Second part of you question:
For a) I have the following:
Left-linear
S --> B00 | S11
B --> B0|B1|011
Right-linear
S --> 00B | 11S
B --> 0B|1B|0|1
(answer)
You solution are wrong for following reasons,
Left-linear grammar is wrong Because string 0010 not possible to generate.
Right-linear grammar is wrong Because string 1000 is not possible to generate. Although both are in language generated by regular expression of question (a).
EDIT
Adding DFA's for each regular expression. so that one can find it helpful.
a) (0+1)*00(0+1)*
b) 0*(1(0+1))*
c) (((01+10)*11)*00)*
Drawing DFA for this regular expression is trick and complex.
For this I wanted to add DFA's
To simplify the task, we should think the kind formation of RE
to me the RE (((01+10)*11)*00)* looks like (a*b)*
(((01+10)*11)* 00 )*
( a* b )*
Actually in above expression a it self in the form of (a*b)*
that is ((01+10)*11)*
RE (a*b)* is equals to (a + b)*b + ^. The DFA for (ab) is as belows:
DFA for ((01+10)*11)* is:
DFA for (((01+10)*11)* 00 )* is:
Try to find similarity in construction of above three DFA. don't move ahead till you don't understand first one
Rules to convert regular expressions to left or right linear regular grammar

First & Follow, Arithmetic Expressions

FIRST(A) = { b, epsilon }
FIRST(S) = { b, epsilon }
FOLLOW(S) = { a, $ }
FOLLOW(A) = { a, b, $ }
What is the Arithmetic Expressions for this First & Follow set?
FIRST(X) = the terminals which can appear first when trying parse the non-terminal X. If it can match an empty string, epsilon is also included.
FOLLOW(X) = the terminals which can appear immediately after the non-terminal X. This is a union of the FIRST-sets of all symbols appearing after X in any parsing rule.
Read more: LL parser
The clues given are:
FIRST(A), FIRST(S) ⇒ All of the derivations of A and S respectively, must either begin with the terminal b, or be zero-length.
S → b ... | ε
A → b ... | ε
FOLLOW(S) ⇒ There must be some construction where S is followed by the terminal a, or a non-terminal which can begin with a. (Neither A nor S qualify).
S → b S a | ε
A → b ... | ε
FOLLOW(A) ⇒ There must be some construction where A is followed by each of the terminals a and b, or some non-terminal which can begin with those.
S → b S a | ε
A → b A b | b A a | ε
FOLLOW(A) ⇒ Assuming S is the start-symbol, A must appear at the end of some branch of S, possibly followed by other nullable non-terminals.
S → b S a | A | ε
A → b A b | b A a | ε
(NB. Adding A to S did not break the constraint on FIRST(S))
We can make the grammar a little smaller:
S → b S a | A | ε
A → b A b | ε
We can no longer generate strings like "bbbabb", but it does not violate the constraints.