How can I prove this language is regular? - finite-automata

I'm trying to prove if this language:
L = { w={0,1}* | #0(w) % 3 = 0 } (number of 0's is divisble by 3)
is regular using the pumping lemma, but I can't find a way to do it. All other examples I got, have a simple form or let's say a more defined form such as w = axbycz etc.

I don't think you can use pumping lemma to prove that a language is regular. To prove a language is regular, you just need to give a regular expression or a DFA. In this case the regular expression is quite easy:
1*(01*01*01*)*
(proof: the regular expression clearly does not accept any string which has the number of 0's not divisible by 3, so we just need to prove that all possible strings which has the number of 0's divisible by 3 is accepted by this regular expression, which can be done by confirming that for strings that contain 3n 0's, the regular expression matches it since 1n001n101n201n3...01n3n-201n3n-101n3n has the same number of 0's and the nk's can be substituted so that it matches the string, and that this format is clearly accepted by the regular expression)
Pumping lemma cannot be used to prove that a language is regular because we cannot set the y as in Daniel Martin's answer. Here is a counter-example, in a similar format as his answer (please correct me if I'm doing something fundamentally different from his answer):
We prove that the language L = {w=0n1p | n ∈ N, n>0, p is prime} is regular using pumping lemma as follows: note that there is at least one occurrence of 0, so we take y as 0, and we have xykz = 0n+k-11p, which still satisfy the language definition. Therefore L is regular.
But this is false, since we know that a sequence with prime-numbered length is not regular. The problem here is we cannot just set y to any character.

Any string in this language with at least three characters in it has this property: either the string has a "1" in it, or there are three "0"s in a row.
If the string contains a 1, then you can split it as in the pumping lemma and set y equal to some 1 in the string. Then obviously the strings xyz, xyyz, xyyyz, etc. are all in the language because all those strings have the same number of zeros.
If the string does not contain a 1, it contains three 0s in a row. Setting y to those three 0s, it should be obvious that xyz, xyyz, xyyyz, etc. are all in the language because you're adding three 0 characters each time, so you always have a number of 0s divisible by 3.
#justhalf in the comments is perfectly correct; the pumping lemma can be used to prove that a regular language can be pumped or that a language that cannot be pumped is not regular, but you cannot use the pumping lemma to prove that a language is regular in the first place. Mea Culpa.
Instead, here's a proof that the given language is regular based on the Myhill-Nerode Theorem:
Consider the set of all strings of 0s and 1s. Divide these strings into three sets:
E0, all strings such that the number of 0s is a multiple of three,
E1, all strings such that the number of 0s is one more than a multiple of three,
E2, all strings such that the number of 0s is two more than a multiple of three.
Obviously, every string of 0s and 1s is in one of these three sets.
Furthermore, if x and z are both strings of 0s and 1s, then consider what it means if the concatenation xz is in L:
If x is in E0, then xz is in L if and only if z is in E0
If x is in E1, then xz is in L if and only if z is in E2
If x is in E2, then xz is in L if and only if z is in E1
Therefore, in the language of the theorem, there is no distinguishing extension for any two strings in the same one of our three Ei sets, and therefore there are at most three equivalence classes. A finite number of equivalence classes means the language is regular.
(in fact, there are exactly three equivalence classes, but that isn't needed)

A language is regular if and only if some nondeterministic finite automaton recognizes it.
Automaton is a finite state machine.
We have to build an automaton that regonizes L.
For each state, thinking like:
"Where am I?"
"Where can I go to, with some given entry?"
So, for L = { w={0,1}* | #0(w) % 3 = 0 }
The possibilites (states) are:
The remainder (rest of division) is 0, 1 or 2. Which means we need three states.
Let q0,q1 and q2 be the states that represent the remainderes 0,1 and 2, respectively.
q0 is the start and final state.
Now, for "0" entries, do the math #0(w)%3 and go to the aproppriated state.
Transion functions:
f(q0, 0) = q1
f(q1, 0) = q2
f(q2, 0) = q0
For "1" entries, it just loops wherever it is, 'cause it doesn't change the machine state.
f(qx, 1) = qx
The pumping lemma proves if some language is not regular.
Here is a good book for theory of computation: Introduction to the Theory of Computation 3rd Edition
by Michael Sipser.

Related

Squeak Smalltak, Does +, -, *, / have more precedence over power?

I understand in Smalltalk numerical calculation, if without round brackets, everything starts being calculated from left to right. Nothing follows the rule of multiplication and division having more precedence over addition and subtraction.
Like the following codes
3 + 3 * 2
The print output is 12 while in mathematics we get 9
But when I started to try power calculation, like
91 raisedTo: 3 + 1.
I thought the answer should be 753572
What I actual get is 68574964
Why's that?
Is it because that +, -, *, / have more precedence over power ?
Smalltalk does not have operators with precedence. Instead, there are three different kinds of messages. Each kind has its own precedence.
They are:
unary messages that consist of a single identifier and do not have parameters as squared or asString in 3 squared or order asString;
binary messages that have a selector composed of !%&*+,/<=>?#\~- symbols and have one parameter as + and -> in 3 + 4 or key -> value;
keyword messages that have one or more parameters and a selector with colons before each parameter as raisedTo: and to:by:do: in 4 risedTo: 3 and 1 to: 10 by: 3 do: [ … ].
Unary messages have precedence over binary and both of them have precedence over keyword messages. In other words:
unary > binary > keyword
So for example
5 raisedTo: 7 - 2 squared = 125
Because first unary 2 squared is evaluated resulting in 4, then binary 7 - 4 is evaluated resulting in 3 and finally keyword 5 risedTo: 3 evaluates to 125.
Of course, parentheses have the highest precedence of everything.
To simplify the understanding of this concept don't think about numbers and math all the numbers are objects and all the operators are messages. The reason for this is that a + b * c does not mean that a, b, and c are numbers. They can be humans, cars, online store articles. And they can define their own + and * methods, but this does not mean that * (which is not a "multiplication", it's just a "star message") should happen before +.
Yes, +, -, *, / have more precedence than raisedTo:, and the interesting aspect of this is the reason why this happens.
In Smalltalk there are three types of messages: unary, binary and keyword. In our case, +, -, * and / are examples of binary messages, while raisedTo: is a keyword one. You can tell this because binary messages are made from characters that are not letters or numbers, unlike unary or keywords, which start with a letter or underscore and follow with numbers or letters or underscores. Also, you can tell when a selector is unary because they do not end with a colon. Thus, raisedTo: is a keyword message because it ends with colon (and is not made of non-letter or numeric symbols).
So, the expression 91 raisedTo: 3 + 1 includes two selectors, one binary (+) and one keyword (raisedTo:) and the precedence rule says:
first evaluate unary messages, then binary ones and finally those with keywords
This is why 3 + 1 gets evaluated first. Of course, you can always change the precedence using parenthesis. For example:
(91 raisedTo: 3) + 1
will evaluate first raisedTo: and then +. Note that you could write
91 raisedTo: (3 + 1)
too. But this is usually not done because Smalltalk precedence rules are so easy to remember that you don't need to emphasize them.
Commonly used binary selectors
# the Point creation message for x # y
>= greater or equal, etc.
-> the Association message for key -> value
==> production tranformation used by PetitParser
= equal
== identical (very same object)
~= not equal
~~ not identical
\\ remainder
// quotient
and a lot more. Of course, you are always entitled to create your own.

Solve three letter string into regular expression way

I need help to solve in regular expression way. The language of all strings defined over Σ = {X, Y, Z} with Y as the third letter and Z being the second last letter.
If you are allowed to use intersection (which does preserve rationality), I would state it simply as ΣΣYΣ* & Σ*ZΣ. If you feed this to Vcsn to normalize it, you get:
In [1]: import vcsn
In [2]: vcsn.B.expression('([XYZ]{2}Y[XYZ]*)&([XYZ]*Z[XYZ])').derived_term().expression()
Out[2]: (X+Y+Z)ZY+(X+Y+Z)(X+Y+Z)Y(X+Y+Z)*Z(X+Y+Z)
The call to derived_term is to build an automaton from the expression, and the last call to expression is to extract a rational expression from this automaton.
As given Σ = {X, Y, Z} , you need to construct the language of all strings defined over it with Y as third letter and Z being the second last letter.
"ΣΣYΣ*ZΣ | ΣZY" will be the required regular expression.
Σ* has all strings that are 0 or more concatenations of strings from Σ.
As you can see, here Y being the third element and Z is placed in second last position. And, Σ can be replaced with any of the X,Y or Z element.
i think regular expression should be like this....
(x+y+z)zy(x+y+z)^*

Minimum number of states in a DFA having '1' as the 5th symbol from right

What is the minimum number of states needed in a DFA to accept the strings having '1' as 5th symbol from right? Strings are defined over the alphabet {0,1}.
The Myhill-Nerode theorem is a useful tool for solving these sorts of problems.
The idea is to build up a set of equivalence classes of strings, using the idea of "distinguishing extensions". Consider two strings x and y. If there exists a string z
such that exactly one of xz and yz is in the language, then z is a distinguishing extension,
and x and y must belong to different equivalence classes. Each equivalence class maps to a different state in the minimal DFA.
For the language you've described, let x and y be any pair of different 5-character strings
over {0,1}. If they differ at position n (counting from the right, starting at 1), then any string z with length 5-n will be a distinguishing extension: if x has a 0 at position n,
and y has a 1 at position n, then xz is rejected and yz is accepted. This gives 25 = 32
equivalence classes.
If s is a string with length k < 5 characters, it belongs to the same equivalence class
as 0(5-k)s (i.e. add 0-padding to the left until it's 5 characters long).
If s is a string with length k > 5 characters, its equivalence class is determined by its final 5 characters.
Therefore, all strings over {0,1} fall into one of the 32 equivalence classes described above, and by the Myhill-Nerode theorem, the minimal DFA for this language has 32 states.
No of state will be 2^n where n is nth symbol from right
So 2^5=32 will be no of states

linear grammar with unequal number of 0s and 1s

Is it possible to come up with a linear grammar with unequal number of 0s and 1s?
Such as 0100, 01100, 111,1,0, 100101001...
I know there is a context-free grammar for this, but is there a linear grammar?
Thanks.
A grammar is regular if and only if it is either left regular or right regular. The left regular grammars are equivalent to the left linear grammars. The right regular grammars are equivalent to the right linear grammars. Therefore, if a regular grammar exists that generates the indicated language, then it is either right or left regular, and hence equivalent to either a left or right linear grammar.
edit1:
Note that there's no regular grammar generating the indicated language LUNEQ. To see this, consider the fact that LEQ = { w : na(w) = nb(w)} is the complement of LUNEQ. Because the regular languages are closed under complementation and LEQ is not a regular language, LUNEQ is not a regular language.
edit2:
I believe the pumping lemma for linear languages can be used to show that the indicated language LUNEQ is not linear. Here is what I've come up with. I'm fairly confident it's correct. My primary concern is that you were asked - presumably - for a linear language generating the indicated language; however, I came to the conclusion that there is no such grammar.
Assume LUNEQ is linear. By the pumping lemma for linear languages, there exists an n > 0 depending on LUNEQ such that for all z ∈ LUNEQ, z can be written uvwxy where:
|vx| > 0,
|uvxy| ≤ n, and
uviwxiy ∈ LUNEQ for all i ≥ 0.
Let n be the constant guaranteed be the pumping lemma. Consider the string
z = anb(n! + 2n)an
Since z ∈ LUNEQ, it can be decomposed into substrings uvwxy satisfying the constraints of the pumping lemma such that, for all i ≥ 0, the string
uviwxiy = a|u|ai|v|a(n - |u| - |v|)b(n! + 2n)a(n - |x| - |y|)ai|x|a|y|
is a member of LUNEQ. Since 1 ≤ |vx| ≤ n, |vx| divides n!. Hence, (n!|vx|-1 + 1) is a natural number. Setting i to (n!|vx|-1 + 1) gives the string
z' = uv(n!|vx|-1 + 1)wx(n!|vx|-1 + 1)y = a|u|a(n!|vx|-1 + 1)|v|a(n - |u| - |v|)b(n! + 2n)a(n - |x| - |y|)a(n!|vx|-1 + 1)|x|a|y|
Simplifying the pumped string gives us an equal number of a's and b's:
na(z') = 2n - |vx| + (n!|vx|-1 + 1)|vx| = 2n + n!
Since (2n + n!) is equivalent to the number of b's in the pumped string, z' ∉ LUNEQ. But this contradicts the assumption that LUNEQ is a linear language. Hence, LUNEQ is not a linear language.

Minimum number of states needed?

Definition of a language L with alphabet { a } is given as following
L = { ank | k > 0 ; and n is a positive integer constant }
What is the number of states needed in a DFA to recognize L?
In my opinion it should be k+1 but I am not sure.
The language L can be recognized by a DFA with n+1 states.
Observe that the length of any string in L is congruent to 0 mod n.
Label n of the states with integers 0, 1, 2, ... n-1, representing each possible remainder. An additional state, S, is the start state. S has a single transition, to state 1. If the machine is currently in state i, on input it moves to state (i+1) mod n. State 0 is
the only accepting state. (If the empty string were part of L, we could eliminate S and make state 0 the start state).
Suppose there were a DFA with fewer than n+1 states that still recognized L. Consider the sequence of states S0, S1, ... Sn encountered while processing the string an. Sn must be an accepting state, since an is in L. But since there are fewer than n+1 distinct states in this DFA, by the pigeonhole principle there must have been some state that was visited at least twice. Removing that loop gives another path (and another accepted string), with length < n, from S0 to Sn. But L contains no strings shorter than n, contradicting our assumption. Therefore no DFA with fewer than n+1 states recognizes L.