Left-Linear and Right-Linear Grammar for a simple Regular Expression - grammar

I am having trouble coming up with a left-linear and right-linear grammar for the following regular expression.
0(0+1)*+10^+
I am also quite confused on what the plus-closure does.
This is what I got for the left linear grammar, but I am not sure if this is correct:
P: S--> 0A | 1A
A--> A0|A1|0S|0| epsilon
Thank you!

One general good way to find left- and right-linear grammars is to find an NFA that has the same language as your regex, then convert that NFA into a left- or right-linear grammar using the following mechanical transform:
For each state q, introduce a nonterminal Tq.
For each transition (q, r) on character a (or where a = ε), add the production Tq → aTr (for left-linear grammars) and Tr → Tqa (for right-linear grammars).
Then, for left-linear grammars:
For each accepting state q, add the production Tq → ε
For left-linear grammars with start state q0, make the start symbol the symbol Tq0.
Then, for right-linear grammars:
Add a start symbol S with the production S → Tq for each accepting state q.
Add the production Tq0 → ε for the start state q0.
Try applying this idea here and you'll end up producing left- and right-linear grammars for your language. They might not be the most efficient grammars, but they'll work.

Related

Definition of First and Follow sets of the right-hand sides of production

I am learning about LL(1) grammars. I have a task of checking if grammar is LL(1) and if not, I then need to find the rules, which prevent it from being LL(1). I came across this link https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/Syntax.html/node14.html which has a theorem which can be used as a criteria for deciding if grammar is LL(1) or not. It says that for any rule A -> alpha | beta some equalities, considering FIRST and FOLLOW sets need to be true. Therefore, I need to find FIRST and FOLLOW sets of these right-hand sides of production.
Let's say, I have following rules A -> a b B S | eps. How do I calculate FIRST and FOLLOW of a b B S? As far as I understand by definition these sets are defined only for 1 non-terminal symbol.
The idea behind the FIRST function is that it returns the set of terminals which could possibly start the expansion of its argument. It's usual to also add the special object ε (which is a way of writing an empty sequence of symbols) if ε is a possible expansion.
So if a is a terminal, FIRST(a) is just { a }. And if A is a non-terminal, FIRST(A) is the set of non-terminals which could possibly appear at the beginning of a derivation of A. Finally, FIRST(ε) must be { ε }, according to the convention described above.
Now suppose α is a (possibly empty) sequence of grammar symbols:
If α is empty (that is, it's ε), FIRST(α) is { ε }
If the first symbol in α is the terminal a, FIRST(α) is { a }.
If the first symbol in α is the non-terminal A, there are two possibilities. Let TAIL(α) be the rest of α after the first symbol. Now:
if ε ∈ FIRST(A), then FIRST(α) is FIRST(A) ∪ FIRST(TAIL(α)).
otherwise, FIRST(α) is FIRST(A).
Now, how do we compute FIRST(A), for every non-terminal A? Using the above definition of FIRST(α), we recursively define FIRST(A) to be the union of the sets FIRST(α) for every α which is the right-hand side of a production A → α.
The FOLLOW function defines the set of terminal symbols which might appear after the expansion of a non-terminal. It is only defined on non-terminals; if you look carefully at the LL(1) conditions on the page you cite, you'll see that FIRST is applied to a right-hand side, while FOLLOW is only applied to left-hand sides.

Formal grammar and arity

I have the following grammar:
S --> LR .
L --> aL .
R --> bR .
This grammar generates the language a^n b^k, where n,k > 0.
I want a grammar that generates the language a^n b^n where n>0, so
my goal is to obtain a grammar in order to ensure that the number of a is always equal of b, but still keeping the non-terminals L and R.
Is there a way to do this?
In a.context free grammar, the derivations of L and R in S → L R are independent of each other. That is what "context free" means: the derivation of a non-terminal is not affected by the context in which the non-terminal occurs.
So if you want a grammar in which L and R must derive strings of equal length, it will have to be a context-sensitive grammar. No context-free grammar can do that.
Of course, there is a simple CFG for the language:
S →
S → a S b

Difference between grammar rules

Say there are two grammar rules
Rule 1 B -> aB | cB
and
Rule 2 B -> Ba | Bc
I'm a bit confused as the difference of these two. Would rule 1's expression be (a+c)* ? Then what would Rule 2's expression be?
Both of those grammars yield the empty language since there is no non-recursive rule, so no sentence consisting only of terminals can be derived.
If you add the production B→ε, both grammars would yield the same language, equivalent to the regular expression (a+c)*. However, the parse trees produced by the parse would be quite different.

type3-only lexers in ANTLR4?

I'm thinking about using ANTLR in my lecture on formal languages since it's input language is pretty clean and easy to learn.
Since I am not an expert using ANTLR I tried some standard examples to get familiar with it's syntax, error messages etc.
Doing so I found out, that:
lexer grammar KFG;
R : 'a'R'b' | 'ab';
is a valid lexer that can be executed e.g. by:
echo "aaabbb" | grun KFG tokens -tokens
Since the grammar is context free it should only be parsable by a parser an not a lexer.
Is there any way to force ANTLR to accept only type 3 grammars for lexers?
Cheers,
Alex
Is there any way to force ANTLR to accept only type 3 grammars for lexers?
AFAIK, no, that is not possible.

Computation of follow set

To compute FOLLOW(A) for all non-terminals A, apply the following rules
until nothing can be added to any FOLLOW set.
Place $ in FOLLOW(S) , where S is the start symbol, and $ is the input
right endmarker .
If there is a production A -> B, then everything in FIRST(b) except epsilon
is in FOLLOW(B) .
If there is a production A -> aBb, or a production A -> aBb, where
FIRST(b) contains t, then everything in FOLLOW(A) is in FOLLOW(B).
a,b is actually alpha and beta(sentential form). This is from dragon book.
Now my question is in this case can we take a=epsilon ?
and can b(beta) be 2 non-terminals like XY? (if senetntial then it solud be..)
Here's what the Dragon book actually says: [See note 1]
Place $ in FOLLOW(S).
For every production A→αBβ, place everything
in FIRST(β) except ε into
FOLLOW(B)
For every production A→αB or
A→αBβ where FIRST(β) contains
ε, place FOLLOW(A) into
FOLLOW(B).
There is a section earlier in the book on "notational conventions" in which it is made clear that a lower-case greek letter like α or β represents a possibly empty string of grammar symbols. So, yes, α could be empty and β could be two nonterminals (or any other string of grammar symbols).
Note:
Here I'm using a variant on the formatting suggesting made by #leftroundabout in this meta post. (The only difference is that I put the formulae in bold.) It's easy to type Greek letters as entities if you don't have a Greek keyboard handy; just use, for example, α (α) or β (β). For upper-case Greek letters, write the name with an upper-case letter: Σ (Σ). Other useful symbols are arrows: → (→) and ⇒ (⇒).