Solve three letter string into regular expression way - finite-automata

I need help to solve in regular expression way. The language of all strings defined over Σ = {X, Y, Z} with Y as the third letter and Z being the second last letter.

If you are allowed to use intersection (which does preserve rationality), I would state it simply as ΣΣYΣ* & Σ*ZΣ. If you feed this to Vcsn to normalize it, you get:
In [1]: import vcsn
In [2]: vcsn.B.expression('([XYZ]{2}Y[XYZ]*)&([XYZ]*Z[XYZ])').derived_term().expression()
Out[2]: (X+Y+Z)ZY+(X+Y+Z)(X+Y+Z)Y(X+Y+Z)*Z(X+Y+Z)
The call to derived_term is to build an automaton from the expression, and the last call to expression is to extract a rational expression from this automaton.

As given Σ = {X, Y, Z} , you need to construct the language of all strings defined over it with Y as third letter and Z being the second last letter.
"ΣΣYΣ*ZΣ | ΣZY" will be the required regular expression.
Σ* has all strings that are 0 or more concatenations of strings from Σ.
As you can see, here Y being the third element and Z is placed in second last position. And, Σ can be replaced with any of the X,Y or Z element.

i think regular expression should be like this....
(x+y+z)zy(x+y+z)^*

Related

T-SQL RegEx Matching "One or More" Operator

In MS SQL, is there an operator that allows the matching of one or more character? (I'm curious about whether its implemented explicitly in T-SQL - other solutions are certainly possible, one of which I use in my question example below . . .)
I know in SQL, this could be explicitly implemented to varying degrees of success with the wildcard/like approach:
SELECT *
FROM table
-- finds letters aix and then anything following it
WHERE column LIKE 'aix_x%'
In Python, the '+' operator allows for this:
import re
str = "The rain in Spain falls mainly in the plain!"
#Check if the string contains "ai" followed by 1 or more "x" characters:
# finds 'ai' + one or more letters x
x = re.findall("aix+", str)
print(x)
if (x):
print("Yes, there is at least one match!")
else:
print("No match")
Check if the string contains "ai" followed by 1 or more "x" characters:
finds 'ai' + one or more letters x
If this is what you want, then:
where str like '%aix%'
does what you want.
If you want an underscore, then an underscore is a wildcard in LIKE expressions. Probably the simplest method in SQL Server is to use a character class:
where str like '%ai[_]x%'
another solution is:
where str like '%ai$_x%' escape '$'

How can I prove this language is regular?

I'm trying to prove if this language:
L = { w={0,1}* | #0(w) % 3 = 0 } (number of 0's is divisble by 3)
is regular using the pumping lemma, but I can't find a way to do it. All other examples I got, have a simple form or let's say a more defined form such as w = axbycz etc.
I don't think you can use pumping lemma to prove that a language is regular. To prove a language is regular, you just need to give a regular expression or a DFA. In this case the regular expression is quite easy:
1*(01*01*01*)*
(proof: the regular expression clearly does not accept any string which has the number of 0's not divisible by 3, so we just need to prove that all possible strings which has the number of 0's divisible by 3 is accepted by this regular expression, which can be done by confirming that for strings that contain 3n 0's, the regular expression matches it since 1n001n101n201n3...01n3n-201n3n-101n3n has the same number of 0's and the nk's can be substituted so that it matches the string, and that this format is clearly accepted by the regular expression)
Pumping lemma cannot be used to prove that a language is regular because we cannot set the y as in Daniel Martin's answer. Here is a counter-example, in a similar format as his answer (please correct me if I'm doing something fundamentally different from his answer):
We prove that the language L = {w=0n1p | n ∈ N, n>0, p is prime} is regular using pumping lemma as follows: note that there is at least one occurrence of 0, so we take y as 0, and we have xykz = 0n+k-11p, which still satisfy the language definition. Therefore L is regular.
But this is false, since we know that a sequence with prime-numbered length is not regular. The problem here is we cannot just set y to any character.
Any string in this language with at least three characters in it has this property: either the string has a "1" in it, or there are three "0"s in a row.
If the string contains a 1, then you can split it as in the pumping lemma and set y equal to some 1 in the string. Then obviously the strings xyz, xyyz, xyyyz, etc. are all in the language because all those strings have the same number of zeros.
If the string does not contain a 1, it contains three 0s in a row. Setting y to those three 0s, it should be obvious that xyz, xyyz, xyyyz, etc. are all in the language because you're adding three 0 characters each time, so you always have a number of 0s divisible by 3.
#justhalf in the comments is perfectly correct; the pumping lemma can be used to prove that a regular language can be pumped or that a language that cannot be pumped is not regular, but you cannot use the pumping lemma to prove that a language is regular in the first place. Mea Culpa.
Instead, here's a proof that the given language is regular based on the Myhill-Nerode Theorem:
Consider the set of all strings of 0s and 1s. Divide these strings into three sets:
E0, all strings such that the number of 0s is a multiple of three,
E1, all strings such that the number of 0s is one more than a multiple of three,
E2, all strings such that the number of 0s is two more than a multiple of three.
Obviously, every string of 0s and 1s is in one of these three sets.
Furthermore, if x and z are both strings of 0s and 1s, then consider what it means if the concatenation xz is in L:
If x is in E0, then xz is in L if and only if z is in E0
If x is in E1, then xz is in L if and only if z is in E2
If x is in E2, then xz is in L if and only if z is in E1
Therefore, in the language of the theorem, there is no distinguishing extension for any two strings in the same one of our three Ei sets, and therefore there are at most three equivalence classes. A finite number of equivalence classes means the language is regular.
(in fact, there are exactly three equivalence classes, but that isn't needed)
A language is regular if and only if some nondeterministic finite automaton recognizes it.
Automaton is a finite state machine.
We have to build an automaton that regonizes L.
For each state, thinking like:
"Where am I?"
"Where can I go to, with some given entry?"
So, for L = { w={0,1}* | #0(w) % 3 = 0 }
The possibilites (states) are:
The remainder (rest of division) is 0, 1 or 2. Which means we need three states.
Let q0,q1 and q2 be the states that represent the remainderes 0,1 and 2, respectively.
q0 is the start and final state.
Now, for "0" entries, do the math #0(w)%3 and go to the aproppriated state.
Transion functions:
f(q0, 0) = q1
f(q1, 0) = q2
f(q2, 0) = q0
For "1" entries, it just loops wherever it is, 'cause it doesn't change the machine state.
f(qx, 1) = qx
The pumping lemma proves if some language is not regular.
Here is a good book for theory of computation: Introduction to the Theory of Computation 3rd Edition
by Michael Sipser.

sympy subs in matrix doesn't change the values

I have a symbolic matrix that I want to differentiate. I have to substitute numeric values to some of the vars and then to solve with respect to 6 unknowns. My problem is that defining the element of matrix A by lambda and subistituting with subs doesn't change any value in the matrix. When I want retrieve the type of matrix in fact it's shown that it's immutable, which seems quite odd. Here's the code:
def optimalF1():
x,y,z=symbols('x y z', Real=True)
phi,theta,psi=symbols('phi theta psi')
b1x,b1y=symbols('b1x b1y')
b2x,b2y=symbols('b2x b2y')
b3x,b3y=symbols('b3x b3y')
b4x,b4y=symbols('b4x b4y')
b5x,b5y=symbols('b5x b5y')
b6x,b6y=symbols('b6x b6y')
bMat=sym.Matrix(([b1x,b2x,b3x,b4x,b5x,b6x],
[b1y,b2y,b3y,b4y,b5y,b6y],[0,0,0,0,0,0]))
mov=np.array([[x],[y],[z]])
Pi=np.repeat(mov,6,axis=1)
sym.pprint(Pi)
print 'shape of thing Pi', np.shape(Pi)
p1x,p1y,p1z=symbols('p1x,p1y,p1z')
p2x,p2y,p2z=symbols('p2x,p2y,p2z')
p3x,p3y,p3z=symbols('p3x,p3y,p3z')
p4x,p4y,p4z=symbols('p4x,p4y,p4z')
p5x,p5y,p5z=symbols('p5x,p5y,p5z')
p6x,p6y,p6z=symbols('p6x,p6y,p6z')
#legs symbolic array
l1,l2,l3,l4,l5,l6=symbols('l1,l2,l3,l4,l5,l6')
piMat=Matrix(([p1x,p2x,p3x,p4x,p5x,p6x],[p1y,p2y,p3y,\
p4y,p5y,p6y],[p1z,p2z,p3z,p4z,p5z,p6z]))
piMat=piMat.subs('p1z',0)
piMat=piMat.subs('p2z',0)
piMat=piMat.subs('p3z',0)
piMat=piMat.subs('p4z',0)
piMat=piMat.subs('p5z',0)
piMat=piMat.subs('p6z',0)
sym.pprint(piMat)
legStroke=np.array([[l1],[l2],[l3],[l4],[l5],[l6]])
'''redefine the Eul matrix
copy values of Pi 6 times by using np.repeat
'''
r1=[cos(phi)*cos(theta)*cos(psi)-sin(phi)*sin(psi),\
-cos(phi)*cos(theta)*sin(psi)-sin(phi)*cos(psi),\
cos(phi)*sin(theta)]
r2=[sin(phi)*cos(theta)*cos(psi)+cos(phi)*sin(psi),\
-sin(phi)*cos(theta)*sin(psi)+cos(phi)*cos(psi),\
sin(phi)*sin(theta)]
r3= [-sin(theta)*cos(psi),sin(theta)*sin(psi),cos(theta)]
EulMat=Matrix((r1,r2,r3))
print(EulMat)
uvw=Pi+EulMat*piMat
print 'uvw matrix is:\n', uvw, np.shape(uvw)
# check thisout -more elegant and compact form
A=Matrix(6,1,lambda j,i:((uvw[0,j]- \
bMat[0,j])**2+(uvw[1,j]-bMat[1,j])**2+\
(uvw[2,j]-bMat[2,j])**2)-legStroke[j]**2)
print'A matrix before simplification:\n ', A
B=simplify(A)
B=B.subs({'x':1.37,'y':0,'z':0,theta:-1.37,phi:0})
print'A matrix form after substituting:\n',B
So comparing A and B leads to the same output. I don't understand why!
When you use subs with variables that have assumptions, you have to use the symbols not strings. Using strings causes a new generic symbol to be created which does not match the symbol having assumptions so the subs fails.
>>> var('x')
x
>>> var('y',real=True)
y
>>> (x+y).subs('x',1).subs('y',2)
y + 1
Note, too, that to make real symbols you should use real=True not Real=True (lower case r).

Contains at least a count of different character in a set

Assume that, I have a character set like this:
['a','b','c','x','y','z']
I want to build a regular expression which matches a certain number of these characters (for example 3).
Here are some examples of it:
ab - no match
xy - no match
abt - no match
aaa - no match
abc - match
yaz - match
yazx - match
ytaz - match
Can this be accomplished with a regular expression?
A simple solution would be a pattern like this:
(.*[abcxyz]){3}
This will match zero or more of any character, followed by one of a, b, c, x, y, or z, all of which must appear at least 3 times in the subject string.
To match only strings that contain different letters, you could use a negative lookahead ((?!…)) and a backreference (\N):
(.*([abcxyz])(?!.*\2)){3}
This will match zero or more of any character, followed by one of a, b, c, x, y, or z, as long as another instance of that character does not appear later in the string (i.e. it will match the last instance of that character in the string), all of which must appear at least 3 times in the subject string.
Of course, you can change the {3} to anything you like, but note that will not work if you need to specify a maximum number of times these characters can appear in your string, only the minimum.

Minimum number of states in a DFA having '1' as the 5th symbol from right

What is the minimum number of states needed in a DFA to accept the strings having '1' as 5th symbol from right? Strings are defined over the alphabet {0,1}.
The Myhill-Nerode theorem is a useful tool for solving these sorts of problems.
The idea is to build up a set of equivalence classes of strings, using the idea of "distinguishing extensions". Consider two strings x and y. If there exists a string z
such that exactly one of xz and yz is in the language, then z is a distinguishing extension,
and x and y must belong to different equivalence classes. Each equivalence class maps to a different state in the minimal DFA.
For the language you've described, let x and y be any pair of different 5-character strings
over {0,1}. If they differ at position n (counting from the right, starting at 1), then any string z with length 5-n will be a distinguishing extension: if x has a 0 at position n,
and y has a 1 at position n, then xz is rejected and yz is accepted. This gives 25 = 32
equivalence classes.
If s is a string with length k < 5 characters, it belongs to the same equivalence class
as 0(5-k)s (i.e. add 0-padding to the left until it's 5 characters long).
If s is a string with length k > 5 characters, its equivalence class is determined by its final 5 characters.
Therefore, all strings over {0,1} fall into one of the 32 equivalence classes described above, and by the Myhill-Nerode theorem, the minimal DFA for this language has 32 states.
No of state will be 2^n where n is nth symbol from right
So 2^5=32 will be no of states