Finding context-free grammar - grammar

I've had quite a problem with this task:
L = {w element of {a,b}* |
the number of a's plus 2 times the number of b's modulo 5 in w is 0}
I thought about:
S -> ε
S -> abbS
S -> babS
S -> bbaS
S -> aaaaaS
S -> aaabS
etc...
But that can't be the optimal solution since you'd also have to shift the positions of the S and it would generate way too many cases. Also it would just be an enumeration of the cases rather then a "general solution" which is clearly not the goal.

I'd suggest introducing auxiliary non-terminal symbols:
M5 = the number of a's plus 2 times the number of b's modulo 5 in w is 0
M4 = the number of a's plus 2 times the number of b's modulo 4 in w is 0
M3 = the number of a's plus 2 times the number of b's modulo 3 in w is 0
M2 = the number of a's plus 2 times the number of b's modulo 2 in w is 0
Then the grammar can be expressed as follows:
S -> ε
S -> M5 S
M5 -> a M4
M5 -> M4 a
M5 -> b M3
M5 -> M3 b
M4 -> a M3
M4 -> M3 a
M4 -> b M2
M4 -> M2 b
M3 -> a M2
M3 -> M2 a
M3 -> b a
M3 -> a b
M2 -> a a
M2 -> b

Related

Show values that doesn't matching between two columns oof 2 data frames in pandas

I would like to see the non-matching values of two columns of 2 data frames, and also if possible doing like a distinct count of those values that way won't show them repeated
I got these colums from 2 data frames:
df1:
ID
M1
M1
M2
M2
M3
M4
M5
M5
M6
M6
df2:
ID
M1
M1
M2
M2
M3
M3
expected result:
Output:
M4
M5
M6
Thanks!
He needs what is not in df2
out = df1.loc[~df1.ID.isin(df2.ID)].unique()
You can check with isin with unique
out = df1.loc[~df1.ID.isin(df2.ID)].unique()

Context Free Grammar BNF

need help with a non-extended BNF grammar:
Σ = {a,b,c}
L = {ω ɛ Σ^* | such that all a's (if any) comes before all c's(if any)}
For example, the strings aba, cbc, and abacbc are in the language, but string abcabc is not.
This is what i have so far (is it correct ? please correct me if i am wrong):
s->asbsc|bsasc|ascsb|ɛ
Your comment says you want equal numbers of a and c, so start with the simple grammar that does that:
S -> aSc | ε
and add in any number of b's before/after/between those:
S -> BaScB | B
B -> Bb | ε
note that the above is not ambiguous (it's even LR(1)).
If you want to allow a different number of a's and c's, you can use the same approach to avoid ambiguity. Start with just the a's and c's:
S -> AC
A -> Aa | ε
C -> Cc | ε
and add in b's at the beginning and after each other character:
S -> BAC
A -> AaB | ε
C -> CcB | ε
B -> Bb | ε
Do the number of a's and c's need to be the same? If, not then you are missing those cases where they differ, such as: aac. I think something like this should work:
S -> AC
A -> aA | bA | ε
C -> bC | cC | ε
The A production is used for deriving a sequence of characters that are not a c and the C production is used for deriving a sequence of characters that are not an a.

Find a grammar of binary number divisible by 5 with 1 as MSB

How can I find a grammar of binary number divisible by 5 with 1 as MSB and find the reversal of L
So, I need a grammar that generates numbers like..
5 = 101
10 = 1010
15 = 1111
20 = 10100
25 = 110011
and so on
I'm assuming this is homework and you just want a hint.
Let's consider a somewhat similar question, but in base 10. How can we write a CFG for numbers divisible by 3.
At first glance, this seems unlikely, but it's actually pretty simple. We start with the observation that:
10k ≅ 1 (mod 3) for any non-negative integer k.
Now consider an integer dδ, where d is a decimal digit and δ is a (possibly empty) sequence of decimal digits. We write |δ| for the length of δ. It's clear that:
d × 10|δ| ≅ d (mod 3), since 10|δ| ≅ 1 (mod 3).
But dδ = d × 10|δ| + δ
So dδ ≅ d + δ (mod 3).
Now suppose we have three languages, L0, L1 and L2, where Li is the language of all decimal numbers which are i mod 3.
I'm going to abuse notation by writing set inclusion statements as though they were grammatical productions, confusing languages and grammars. Forgive me. It seems easier to read if your focus is on CFGs.
For single digit numbers, we can define:
D0 → 0 | 3 | 6 | 9
D1 → 1 | 4 | 7
D2 → 2 | 5 | 8
and then we have:
L0 → D0
L1 → D1
L2 → D2
By the above arithmetic identities, we also have:
L0 → D0 L0 | D1 L2 | D2 L1
L1 → D0 L1 | D1 L0 | D2 L2
L2 → D0 L2 | D1 L1 | D2 L0
That's a CFG, so we're done. (Well, almost done. It would be useful to prove that L0 ⋃ L1 ⋃ L2 is the set of all strings in the alphabet {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, but that's easy by induction on the length of the string.)
In fact, not only are Li context-free languages; they are actually regular languages. So there is a regular expression equivalent to each of them. For example, I believe that L0 is:
(D0|D2D0*D1|(D1|D2D0*D2)(D0|D1D0*D2)*(D2|D1D0*D1))*
Now, how can this be repeated for binary numbers divisible by 5?
you can easily come with a grammaer that will give you all the even multiplies of 5 (10,20,30...)
now, after you have that one - you can concat the string '101' to it and get almost all the odd multiplies.. you'll
hope this will help - it's probably not the smartest way though

How to write this CFG?

the question is to construct a CFG which generates language
my solution is: S -> aSb | aS | bS | a | b, however, this grammar can also generate strings like aabb, so how to do it?
Thanks for help.
So you want a string of a's then a string of b's, with an unequal number of a's and b's. First, let's ignore the equality condition. Then:
S -> aSb | 0
will generate all strings that start with a's and then b's. This rule guarantees an equal number of a's and b's, or the empty string. Now what we want is either more a's, or more b's, but not both. Because if we wanted one more a AND one more b, we'd just apply S again. So we add two new rules:
A -> aA
B -> bB
and update S to be:
S -> aSb | A | B
So now we can add an equal number of a and b, or add more a's, or add more b's, but not both. This guarantees inequality, so we're almost done. If you don't need the empty string, you can just stop here. For the null string, we can't do:
S -> aSb | A | B | 0,
because that can lead to S -> aSb -> a0b -> ab, which violates the condition. We also can't do:
A -> aA | 0,
because that can produce S -> aSb -> aAb -> a0b -> ab. So what do we do? The trick is to force S's later expansions to have at least one a or b, like this:
S -> aSb | aA | bB
A -> aA | 0
B -> bB | 0
and that's your solution.

Converting grammar to Chomsky Normal Form?

Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.
S -> AB | aB
A -> aab|lambda
B -> bbA
Ok so the first thing I did was add a new start variable S0
so now I have
S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA
then I removed all of the lambda rules:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Then I checked for S->S and A->B type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?
Wikipedia says:
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:
A -> BC, or
A -> α, or
S -> ε
where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.
Continuing your work:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Instead of using | to denote different choices, split a rule into multiple rules.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Create new rules Y -> a and Z -> b because we will need them soon.
S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
S -> aB is not of the form S -> BC because a is a terminal. So change a into Y:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b
Do the same for the B -> bb rule:
S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> ZZ
Y -> a
Z -> b
For A -> aab, create C -> YY; for B -> bbA, create D -> ZZ:
S0 -> S
S -> AB
S -> YB
S -> B
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
For S -> B, duplicate the one rule where S occurs on the right hand side and inline the rule:
S0 -> B
S0 -> S
S -> AB
S -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
Deal with the rules S0 -> B and S0 -> S by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):
S0 -> DA
S0 -> ZZ
S0 -> AB
S0 -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b
And we're done. Phew!
Without getting into too much theory and proofs(you could look at this in Wikipedia), there are a few things you must do when converting a Context Free Grammar to Chomsky Normal Form, you generally have to perform four Normal-Form Transformations. First, you need to identify all the variables that can yield the empty string(lambda/epsilon), directly or indirectly - (Lambda-Free form). Second, you need to remove unit productions - (Unit-Free form). Third, you need to find all the variables that are live/useful (Usefulness). Four, you need to find all the reachable symbols (Reachable). At each step you might or might not have a new grammar. So for your problem this is what I came up with...
Context-Free Grammar
G(Variables = { A B S }
Start = S
Alphabet = { a b lamda}
Production Rules = {
S -> | AB | aB |
A -> | aab | lamda |
B -> | bbA | } )
Remove lambda/epsilon
ERRASABLE(G) = { A }
G(Variables = { A S B }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | B |
B -> | bbA | bb | } )
Remove unit produtions
UNIT(A) { A }
UNIT(B) { B }
UNIT(S) { B S }
G (Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Determine live symbols
LIVE(G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Remove unreachable
REACHABLE (G) = { b A B S a }
G(Variables = { A B S }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | aB | bb | bbA |
A -> | aab |
B -> | bbA | bb | })
Replace all mixed strings with solid nonterminals
G( Variables = { A S B R I }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IIA |
A -> | RRI |
B -> | IIA | II |
R -> | a |
I -> | b | })
Chomsky Normal Form
G( Variables = { V A B S R L I Z }
Start = S
Alphabet = { a b }
Production Rules = {
S -> | AB | RB | II | IV |
A -> | RL |
B -> | IZ | II |
R -> | a |
I -> | b |
L -> | RI |
Z -> | IA |
V -> | IA | })
Alternative answer: The grammar can only produce a finite number of strings, namely 6.
S -> aabbbaab | aabbb | bbaab | bb | abbaab | abb.
You can now condense this back to Chomsky Normal Form by hand.
By substitution, we can find the set of all strings produced. Your initial rules:
S -> AB | aB.
A -> aab | lambda.
B -> bbA.
First split up the S rule:
S -> AB.
S -> aB.
Now substitute what A and B expand into:
S -> AB
-> (aab | lambda) bbA
-> (aab | lambda) bb (aab | lambda).
S -> aB
-> abbA
-> abb (aab | lambda).
Expand these again to get:
S -> aabbbaab.
S -> aabbb.
S -> bbaab.
S -> bb.
S -> abbaab.
S -> abb.
To change this finite set to Chomsky Normal Form, it suffices to do it by brute force without any intelligent factoring. First we introduce two terminal rules:
X -> a.
Y -> b.
Now for each string, we consume the first letter with a terminal variable and the remaining letters with a new variables. For example, like this:
S -> aabbb. (initial rule, not in Chomsky Normal Form)
S -> XC, where X->a and C->abbb.
C -> XD, where X->a and D->bbb.
D -> YE, where Y->b and E->bb.
E -> YY, where Y->b and Y->b.
We just go through this process for all 6 strings, generating a lot of new intermediate variables.