Deterministic Finite State Automata for Modulo Comparison - finite-automata

I'm working on creating a deterministic finite state automata for the following problem:
You can create strings made of x's and y's. How do you create a diagram that only accepts the language when the number of (x's mod 4) is greater than the number of (y's mod 4)?
What I am currently able to understand is that I need to create a diagram similar to below:
>(0,0) -b-> (0,1) -b-> (0,2) -b-> (0,3) -b-> (0,4)
a a a a a
(1,0) -b-> (1,1) -b-> (1,2) -b-> (1,3) -b-> (1,4)
a a a a a
(2,0) -b-> (2,1) -b-> (2,2) -b-> (2,3) -b-> (2,4)
a a a a a
(3,0) -b-> (3,1) -b-> (3,2) -b-> (3,3) -b-> (3,4)
a a a a a
(4,0) -b-> (4,1) -b-> (4,2) -b-> (4,3) -b-> (4,4)
But what I don't understand is how to compare the number of times x and y occur relative to one another.

I try to answer your question by defining the DFA behaviour through a transition table instead of using a transition diagram - a table is easier to type here and it's functionally equivalent to the corresponding transition diagram.
In the table below
the first column lists all the states; every state is represented by an ordered pair whose elements represents, respectively, the number of x's and the number of y's read so far
the second and third columns contain, respectively, the state reached when reading an x or an y while the DFA is in the state shown in the first column of the table
the states highlighted in yellow are the accept states: if the DFA is in one of those states after examining the last symbol in the string then the string is accepted, that is it belongs to the language.
Looking at this table, you should easily be able to adapt your diagram so as to make things work properly. For example, the first line in the table corresponds to the following portion of your diagram
>(0,0) -y-> (0,1)
x
(1,0)

You have pretty much arrived at the answer in the question itself.
First you see which states you need. You need to calculate n(x) mod 4. This requires 4 states. Then you need to calculate n(y) mod 4. This also requires 4 states. Now, since you need to calculate both together, you have to intersect them. This will result in 4 x 4 = 16 states. Let us name those states as (a, b); a = n(x) mod 4, b = n(y) mod 4.
Then, you decide what is the initial state and what are the final states. In this case, initial state is (0,0) and final states are { (a,b) : a > b }.
So, this is what you get finally:
DFA drawer link

Related

How to find most similar numerical arrays to one array, using Numpy/Scipy?

Let's say I have a list of 5 words:
[this, is, a, short, list]
Furthermore, I can classify some text by counting the occurrences of the words from the list above and representing these counts as a vector:
N = [1,0,2,5,10] # 1x this, 0x is, 2x a, 5x short, 10x list found in the given text
In the same way, I classify many other texts (count the 5 words per text, and represent them as counts - each row represents a different text which we will be comparing to N):
M = [[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40],
...]
Now, I want to find the top 1 (2, 3 etc) rows from M that are most similar to N. Or on simple words, the most similar texts to my initial text.
The challenge is, just checking the distances between N and each row from M is not enough, since for example row M4 [4,0,8,20,40] is very different by distance from N, but still proportional (by a factor of 4) and therefore very similar. For example, the text in row M4 can be just 4x as long as the text represented by N, so naturally all counts will be 4x as high.
What is the best approach to solve this problem (of finding the most 1,2,3 etc similar texts from M to the text in N)?
Generally speaking, the most widely standard technique of bag of words (i.e. you arrays) for similarity is to check cosine similarity measure. This maps your bag of n (here 5) words to a n-dimensional space and each array is a point (which is essentially also a point vector) in that space. The most similar vectors(/points) would be ones that have the least angle to your text N in that space (this automatically takes care of proportional ones as they would be close in angle). Therefore, here is a code for it (assuming M and N are numpy arrays of the similar shape introduced in the question):
import numpy as np
cos_sim = M[np.argmax(np.dot(N, M.T)/(np.linalg.norm(M)*np.linalg.norm(N)))]
which gives output [ 4 0 8 20 40] for your inputs.
You can normalise your row counts to remove the length effect as you discussed. Row normalisation of M can be done as M / M.sum(axis=1)[:, np.newaxis]. The residual values can then be calculated as the sum of the square difference between N and M per row. The minimum difference (ignoring NaN or inf values obtained if the row sum is 0) is then the most similar.
Here is an example:
import numpy as np
N = np.array([1,0,2,5,10])
M = np.array([[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40]])
# sqrt of sum of normalised square differences
similarity = np.sqrt(np.sum((M / M.sum(axis=1)[:, np.newaxis] - N / np.sum(N))**2, axis=1))
# remove any Nan values obtained by dividing by 0 by making them larger than one element
similarity[np.isnan(similarity)] = similarity[0]+1
result = M[similarity.argmin()]
result
>>> array([ 4, 0, 8, 20, 40])
You could then use np.argsort(similarity)[:n] to get the n most similar rows.

Flop count for variable initialization

Consider the following pseudo code:
a <- [0,0,0] (initializing a 3d vector to zeros)
b <- [0,0,0] (initializing a 3d vector to zeros)
c <- a . b (Dot product of two vectors)
In the above pseudo code, what is the flop count (i.e. number floating point operations)?
More generally, what I want to know is whether initialization of variables counts towards the total floating point operations or not, when looking at an algorithm's complexity.
In your case, both a and b vectors are zeros and I don't think that it is a good idea to use zeros to describe or explain the flops operation.
I would say that given vector a with entries a1,a2 and a3, and also given vector b with entries b1, b2, b3. The dot product of the two vectors is equal to aTb that gives
aTb = a1*b1+a2*b2+a3*b3
Here we have 3 multiplication operations
(i.e: a1*b1, a2*b2, a3*b3) and 2 addition operations. In total we have 5 operations or 5 flops.
If we want to generalize this example for n dimensional vectors a_n and b_n, we would have n times multiplication operations and n-1 times addition operations. In total we would end up with n+n-1 = 2n-1 operations or flops.
I hope the example I used above gives you the intuition.

Infinite subtraction loop

The problem is like this :
Suppose a , b two integers,
We say that (a,b) are infinite if we can repeat the following function infinitely:
if a > b :
a=a-b
b=2*b
if b > a :
b=b-a
a=2*a
and 'a' will never be equal to 'b' in any iteration.
Is there a way to test if two integers are infinite without resorting to loops ?
Example 1: a=1,b=4
(1, 4) -> (2, 3) -> (4, 1 -> (3, 2) -> (1, 4) and so on ===> Infinite
Example 2: a=3,b=5
(3, 5 -> (6, 2) -> (4, 4) ====> Not Infinite
Let's rewrite your function as follows:
a=|a-b|
b=2*min(a,b)
Now we can see that if the difference between a and b is odd, then a will be odd in the next round (we just said the difference was odd), and b will be even (2*k is always even). Therefore the difference will be odd in the next iteration, and in all future iterations, so a will never equal b and the pair is "infinite".
If the difference between a and b is even, then this will be true for all future iterations (by similar logic). To work on this case, let's work backwards from the final state. If the pair is finite, then the end of the iterative process will produce a pair (k,k). The smaller integer in the prior iteration therefore was k/2, so the larger one was 3k/2: (k/2,3k/2). Continuing to follow the process backwards, the prior step then was (k/4,k/4+3k/2)=(k/4,7k/4), preceded by (k/8,k/8+7k/4)=(k/8,15k/8). If we abstract this pattern, we can see that any pair of the form (m,(2^n-1)m) (for any n) is finite.
This isn't a complete proof, but I believe that an even-differenced pair is finite if and only if one member of the pair is the product of the other and a value one less than a power of two. Please let me know if you can find a counter-example.
Here's the beginning of the brute-force process I used to come to that conclusion:
(1,1) <- impossible
(2,2) <- (3,1)
(3,3) <- impossible
(4,4) <- (6,2) <- (7,1)
(6,6) <- (9,3)
(8,8) <- (12,4) <- (14,2) <- (15,1)
(10,10) <- (15,5)
(12,12) <- (18,6) <- (21,3)
(14,14) <- (21,7)
(16,16) <- (24,8) <- (28,4) <- (30,2) <- (31,1)
[*3] [*7] [*15] [*31]
(18,18) <- (27,9)
(20,20) <- (30,10) <- (35,5)
Thanks, this was fun to work through.
Consider the first half of the function, where a>b. If this remains iteration after iteration we have:
The "no-infinite" condition is an=bn for some "n". So
The second half of the funtion is just a swap(a,b) when some ak< bk
a0 &lt b0(1+2k+1)
The problem becomes finding if a0 is a multiple of b0 with the factor (1+2k+1) for some 'k'.
This is another problem. You can discard any pair (a,b) when a=2k·b. You only have to test several k while a>2k·b
Note
I do not have checked all my maths. You may find errors. The general idea remains.
Note 2
#Nvioli found an error. The right factor is (2k+1-1)

Np.where function

I've got a little problem understanding the where function in numpy.
The ‘times’ array contains the discrete epochs at which GPS measurements exist (rounded to the nearest second).
The ‘locations’ array contains the discrete values of the latitude, longitude and altitude of the satellite interpolated from 10 seconds intervals to 1 second intervals at the ‘times’ epochs.
The ‘tracking’ array contains an array for each epoch in ‘times’ (array within an array). The arrays have 5 columns and 32 rows. The 32 rows correspond to the 32 satellites of the GPS constellation. The 0th row corresponds to the 1st satellite, the 31st to the 32nd. The columns contain the following (in order): is the satellite tracked (0), is L1 locked (1), is L2 locked (2), is L1 unexpectedly lost (3), is L2 unexpectedly lost (4).
We need to find all the unexpected losses and put them in an array so we can plot it on a map.
What we tried to do is:
i = 0
with np.load(r’folderpath\%i.npz' %i) as oneday_data: #replace folderpath with your directory
times = oneday_data['times']
locations = oneday_data['locations']
tracking = oneday_data['tracking']
A = np.where(tracking[:][:][4] ==1)
This should give us all the positions of the losses. With this indices it is easy to get the right locations. But it keeps returning useless data.
Can someone help us?
I think the problem is your dual slices. Further, having an array of arrays could lead to weird problems (I assume you mean an object array of 2D arrays).
So I think you need to dstack tracking into a 3D array, then do where on that. If the array is already 3D, then you can skip the dstack part. This will get the places where L2 is unexpectedly lost, which is what you did in your example:
tracking3d = np.dstack(tracking)
A0, A2 = np.where(tracking3d[:, 4, :]==1)
A0 is the position of the 1 along axis 0 (satellite), while A2 is the position of the same 1 along axis 2 (time epoch).
If the values of tracking can only be 0 or 1, you can simplify this by just doing np.where(tracking3d[:, 4, :]).
You can also roll the axes back into the configuration you were using (0: time epoch, 1: satellite, 2: tracking status)
tracking3d = np.rollaxis(np.dstack(tracking), 2, 0)
A0, A1 = np.where(tracking3d[:, :, 4]==1)
If you want to find the locations where L1 or L2 are unexpectedly lost, you can do this:
tracking3d = np.rollaxis(np.dstack(tracking), 2, 0)
A0, A1, _ = np.where(tracking3d[:, :, 3:]==1)
In this case it is the same, except there is a dummy variable _ used for the location along the last axis, since you don't care whether it was lost for L1 or L2 (if you do care, you could just do np.where independently for each axis).

question about inversion

I have read something in the site that inversion means if i<j then A[i]>A[j] and it has some exercises about this , I have a lot of questions but I want to ask just one of them at first and then i will do the other exercises by myself if I can!!
Exercise: What permutation array (1,2, ..., n) has the highest number of inversion? What are these?
thanks
Clearly N, ..., 2, 1 has the highest number of inversions. Every pair is an inversion. For example for N = 6, we have 6 5 4 3 2 1. The inversions are 6-5, 6-4, 6-3, 6-2, 6-1, 5-4, 5-3 and so on. Their number is N * (N - 1) / 2.
Well, the identity permutation (1,2,...,n) has no inversions. Since an inversion is a pair of elements that are in reverse order than their indices, the answer probably involves some reversal of that permutation.
I have never heard the term inversion used in this way.
A decreasing array of length N, for N>0, has 1/2*N*(N-1) pairs i<j with A[i]>A[j]. This is the maximum possible.