How to perform matching between two sequences? - cntk

I have two mini-batch of sequences :
a = C.sequence.input_variable((10))
b = C.sequence.input_variable((10))
Both a and b have variable-length sequences.
I want to do matching between them where matching is defined as: match (eg. dot product) token at each time step of a with token at every time step of b .
How can I do this?

I have mostly answered this on github but to be consistent with SO rules, I am including a response here. In case of something simple like a dot product you can take advantage of the fact that it factorizes nicely, so the following code works
axisa = C.Axis.new_unique_dynamic_axis('a')
axisb = C.Axis.new_unique_dynamic_axis('b')
a = C.sequence.input_variable(1, sequence_axis=axisa)
b = C.sequence.input_variable(1, sequence_axis=axisb)
c = C.sequence.broadcast_as(C.sequence.reduce_sum(a), b) * b
c.eval({a: [[1, 2, 3],[4, 5]], b: [[6, 7], [8]]})
[array([[ 36.],
[ 42.]], dtype=float32), array([[ 72.]], dtype=float32)]
In the general case you need the following steps
static_b, mask = C.sequence.unpack(b, neutral_value).outputs
scores = your_score(a, static_b)
The first line will convert the b sequence into a static tensor with one more axis than b. Because of packing, some elements of this tensor will be invalid and those will be indicated by the mask. The neutral_value will be placed as a dummy value in the static_b tensor wherever data was missing. Depending on your score you might be able to arrange for the neutral_value to not affect the final score (e.g. if your score is a dot product a 0 would be a good choice, if it involves a softmax -infinity or something close to that would be a good choice). The second line can now have access to each element of a and all the elements of b as the first axis of static_b. For a dot product static_b is a matrix and one element of a is a vector so a matrix vector multiplication will result in a sequence whose elements are all inner products between the corresponding element of a and all elements of b.

Related

Efficiently get first N numbers that satisfy a condition in each row in a pytorch/numpy tensor

Given a tensor b, and I would like to extract N elements in each row that satisfy a specific condition. For example, suppose a is a matrix that indicates whether an element in b satisfy the condition or not. Now, I would like to extract N elements in each row whose corresponding value in a is 1.
And there can be two scenarios. (1) I just extract the first N elements in each row in order. (2) among all the elements that satisfy the condition, I randomly sample N elements in each row.
Is there an efficient way to achieve these two cases in pytorch or numpy? Thanks!
Below I give an example that shows the first case.
import torch
# given
a = torch.tensor([[1, 0, 0, 1, 1, 1], [0, 1, 0, 1, 1, 1], [1,1,1,1,1,0]])
b = torch.arange(18).view(3,6)
# suppose N=3
# output:
c = torch.tensor([[0, 3,4],[7,9,10], [12,13,14]])

Aligning words in 2 sentences using a numpy 2d array

Given 2 sentences, I need to align the words in those sentences based on the best similarity match between the words in these sentences.
For instance, consider 2 sentences:
sent1 = "John saw Mary" # 3 tokens
sent2 = "All the are grown by farmers" # 6 tokens
Here, for each token in sent1, I need to find the most similar token in sent2. Further, if a token in sent2 is already matched with a token in sent1, then it cannot be matched with another token in sent1.
For the purpose, I use a similarity matrix between the tokens in a sentence, as given below
cosMat = (array([[0.1656948 , 0.16653526, 0.13380264, 0.09286133, 0.16262592,
0.14392284],
[0.40876892, 0.46331584, 0.28574535, 0.34924293, 0.2480594 ,
0.25846344],
[0.15394737, 0.10269377, 0.12189645, 0.09426117, 0.09631223,
0.10549664]], dtype=float32)
cosMat is a 2d ndarray of size (3,6) which contain the cosine similarity scores of the tokens in both the sentences.
np.argmax would provide the following array as output
np.argmax(cosMat,axis=1)
array([1, 1, 0]))
However this is not a valid solution, as the first and second tokens of sent1 aligns with second token of sent2.
Instead I chose to do the following:
sortArr = np.dstack(np.unravel_index(np.argsort(-cosMat.ravel()), cosMat.shape))
rowSet = set()
colSet = set()
matches = list()
for item in sortArr[0]:
if item[1] not in colSet:
if item[0] not in rowSet:
matches.append((item[0],item[1],cosMat[item[0],item[1]]))
colSet.add(item[1])
rowSet.add(item[0])
matches
This gives the following output, which is the desirable output:
[(1, 1, 0.46331584), (0, 0, 0.1656948), (2, 2, 0.121896446)]
My question is, is there a more efficient way to achieve, for what I have done using the code above?
Here's an alternative, it requires you to copy the initial similarity matrix. Everytime you find the best match, you discard the two tokens from the pair by replacing the correspond row and column in the copied matrix by 0. This ensures you do not find the same token in multiple pairs.
res = []
mat = np.copy(cosMat)
for _ in range(mat.shape[0]):
i, j = np.unravel_index(mat.argmax(), mat.shape)
res.append((i, j, mat[i, j]))
mat[i,:], mat[:,j] = 0, 0
Will return:
[(1, 1, 0.46331584), (0, 0, 0.1656948), (2, 2, 0.12189645)]
However, considering you only using np.argsort once. Yours will, most probably, be faster.
Otherwise, I would rewrite your version, for conciseness, as:
sortArr = zip(*np.unravel_index(np.argsort(-cosMat.ravel()), cosMat.shape))
matches = []
rows, cols = set(), set()
for x, y in sortArr:
if x not in cols and y not in rows:
matches.append((x, y, cosMat[x, y]))
cols.add(x)
rows.add(y)
You could use a single set instead of two, by using some kind of prefix for the indices in order to distinguish rows from columns. Here again I'm not sure there's much gain in doing so:
matches = []
matched = set()
for x, y in sortArr:
if 'i%i'%x not in matched and 'j%i'%y not in matched:
matches.append((x, y, cosMat[x, y]))
matched.update(['i%s'%x, 'j%s'%y])

Numpy index array of unknown dimensions?

I need to compare a bunch of numpy arrays with different dimensions, say:
a = np.array([1,2,3])
b = np.array([1,2,3],[4,5,6])
assert(a == b[0])
How can I do this if I do not know either the shape of a and b, besides that
len(shape(a)) == len(shape(b)) - 1
and neither do I know which dimension to skip from b. I'd like to use np.index_exp, but that does not seem to help me ...
def compare_arrays(a,b,skip_row):
u = np.index_exp[ ... ]
assert(a[:] == b[u])
Edit
Or to put it otherwise, I wan't to construct slicing if I know the shape of the array and the dimension I want to miss. How do I dynamically create the np.index_exp, if I know the number of dimensions and positions, where to put ":" and where to put "0".
I was just looking at the code for apply_along_axis and apply_over_axis, studying how they construct indexing objects.
Lets make a 4d array:
In [355]: b=np.ones((2,3,4,3),int)
Make a list of slices (using list * replicate)
In [356]: ind=[slice(None)]*b.ndim
In [357]: b[ind].shape # same as b[:,:,:,:]
Out[357]: (2, 3, 4, 3)
In [358]: ind[2]=2 # replace one slice with index
In [359]: b[ind].shape # a slice, indexing on the third dim
Out[359]: (2, 3, 3)
Or with your example
In [361]: b = np.array([1,2,3],[4,5,6]) # missing []
...
TypeError: data type not understood
In [362]: b = np.array([[1,2,3],[4,5,6]])
In [366]: ind=[slice(None)]*b.ndim
In [367]: ind[0]=0
In [368]: a==b[ind]
Out[368]: array([ True, True, True], dtype=bool)
This indexing is basically the same as np.take, but the same idea can be extended to other cases.
I don't quite follow your questions about the use of :. Note that when building an indexing list I use slice(None). The interpreter translates all indexing : into slice objects: [start:stop:step] => slice(start, stop, step).
Usually you don't need to use a[:]==b[0]; a==b[0] is sufficient. With lists alist[:] makes a copy, with arrays it does nothing (unless used on the RHS, a[:]=...).

How can I replace the summing in numpy matrix multiplication with concatenation in a new dimension?

For each location in the result matrix, instead of storing the dot product of the corresponding row and column in the argument matrices, I would like like to store the element wise product, which will be a vector extending into a third dimension.
One idea would be to convert the argument matrices to vectors with vector entries, and then take their outer product, but I'm not sure how to do this either.
EDIT:
I figured it out before I saw there was a reply. Here is my solution:
def newdot(A, B):
A = A.reshape((1,) + A.shape)
B = B.reshape((1,) + B.shape)
A = A.transpose(2, 1, 0)
B = B.transpose(1, 0, 2)
return A * B
What I am doing is taking apart each row and column pair that will have their outer product taken, and forming two lists of them, which then get their contents matrix multiplied together in parallel.
It's a little convoluted (and difficult to explain) but this function should get you what you're looking for:
def f(m1, m2):
return (m2.A.T * m1.A.reshape(m1.shape[0],1,m1.shape[1]))
m3 = m1 * m2
m3_el = f(m1, m2)
m3[i,j] == sum(m3_el[i,j,:])
m3 == m3_el.sum(2)
The basic idea is to turn the matrices into arrays and do element-by-element multiplication. One of the arrays gets reshaped to have a size of one in its middle dimension, and array broadcasting rules expand this dimension out to match the height of the other array.

finding matrix through optimisation

I am looking for algorithm to solve the following problem :
I have two sets of vectors, and I want to find the matrix that best approximate the transformation from the input vectors to the output vectors.
vectors are 3x1, so matrix is 3x3.
This is the general problem. My particular problem is I have a set of RGB colors, and another set that contains the desired color. I am trying to find an RGB to RGB transformation that would give me colors closer to the desired ones.
There is correspondence between the input and output vectors, so computing an error function that should be minimized is the easy part. But how can I minimize this function ?
This is a classic linear algebra problem, the key phrase to search on is "multiple linear regression".
I've had to code some variation of this many times over the years. For example, code to calibrate a digitizer tablet or stylus touch-screen uses the same math.
Here's the math:
Let p be an input vector and q the corresponding output vector.
The transformation you want is a 3x3 matrix; call it A.
For a single input and output vector p and q, there is an error vector e
e = q - A x p
The square of the magnitude of the error is a scalar value:
eT x e = (q - A x p)T x (q - A x p)
(where the T operator is transpose).
What you really want to minimize is the sum of e values over the sets:
E = sum (e)
This minimum satisfies the matrix equation D = 0 where
D(i,j) = the partial derivative of E with respect to A(i,j)
Say you have N input and output vectors.
Your set of input 3-vectors is a 3xN matrix; call this matrix P.
The ith column of P is the ith input vector.
So is the set of output 3-vectors; call this matrix Q.
When you grind thru all of the algebra, the solution is
A = Q x PT x (P x PT) ^-1
(where ^-1 is the inverse operator -- sorry about no superscripts or subscripts)
Here's the algorithm:
Create the 3xN matrix P from the set of input vectors.
Create the 3xN matrix Q from the set of output vectors.
Matrix Multiply R = P x transpose (P)
Compute the inverseof R
Matrix Multiply A = Q x transpose(P) x inverse (R)
using the matrix multiplication and matrix inversion routines of your linear algebra library of choice.
However, a 3x3 affine transform matrix is capable of scaling and rotating the input vectors, but not doing any translation! This might not be general enough for your problem. It's usually a good idea to append a "1" on the end of each of the 3-vectors to make then a 4-vector, and look for the best 3x4 transform matrix that minimizes the error. This can't hurt; it can only lead to a better fit of the data.
You don't specify a language, but here's how I would approach the problem in Matlab.
v1 is a 3xn matrix, containing your input colors in vertical vectors
v2 is also a 3xn matrix containing your output colors
You want to solve the system
M*v1 = v2
M = v2*inv(v1)
However, v1 is not directly invertible, since it's not a square matrix. Matlab will solve this automatically with the mrdivide operation (M = v2/v1), where M is the best fit solution.
eg:
>> v1 = rand(3,10);
>> M = rand(3,3);
>> v2 = M * v1;
>> v2/v1 - M
ans =
1.0e-15 *
0.4510 0.4441 -0.5551
0.2220 0.1388 -0.3331
0.4441 0.2220 -0.4441
>> (v2 + randn(size(v2))*0.1)/v1 - M
ans =
0.0598 -0.1961 0.0931
-0.1684 0.0509 0.1465
-0.0931 -0.0009 0.0213
This gives a more language-agnostic solution on how to solve the problem.
Some linear algebra should be enough :
Write the average squared difference between inputs and outputs ( the sum of the squares of each difference between each input and output value ). I assume this as definition of "best approximate"
This is a quadratic function of your 9 unknown matrix coefficients.
To minimize it, derive it with respect to each of them.
You will get a linear system of 9 equations you have to solve to get the solution ( unique or a space variety depending on the input set )
When the difference function is not quadratic, you can do the same but you have to use an iterative method to solve the equation system.
This answer is better for beginners in my opinion:
Have the following scenario:
We don't know the matrix M, but we know the vector In and a corresponding output vector On. n can range from 3 and up.
If we had 3 input vectors and 3 output vectors (for 3x3 matrix), we could precisely compute the coefficients αr;c. This way we would have a fully specified system.
But we have more than 3 vectors and thus we have an overdetermined system of equations.
Let's write down these equations. Say that we have these vectors:
We know, that to get the vector On, we must perform matrix multiplication with vector In.In other words: M · I̅n = O̅n
If we expand this operation, we get (normal equations):
We do not know the alphas, but we know all the rest. In fact, there are 9 unknowns, but 12 equations. This is why the system is overdetermined. There are more equations than unknowns. We will approximate the unknowns using all the equations, and we will use the sum of squares to aggregate more equations into less unknowns.
So we will combine the above equations into a matrix form:
And with some least squares algebra magic (regression), we can solve for b̅:
This is what is happening behind that formula:
Transposing a matrix and multiplying it with its non-transposed part creates a square matrix, reduced to lower dimension ([12x9] · [9x12] = [9x9]).
Inverse of this result allows us to solve for b̅.
Multiplying vector y̅ with transposed x reduces the y̅ vector into lower [1x9] dimension. Then, by multiplying [9x9] inverse with [1x9] vector we solved the system for b̅.
Now, we take the [1x9] result vector and create a matrix from it. This is our approximated transformation matrix.
A python code:
import numpy as np
import numpy.linalg
INPUTS = [[5,6,2],[1,7,3],[2,6,5],[1,7,5]]
OUTPUTS = [[3,7,1],[3,7,1],[3,7,2],[3,7,2]]
def get_mat(inputs, outputs, entry_len):
n_of_vectors = inputs.__len__()
noe = n_of_vectors*entry_len# Number of equations
#We need to construct the input matrix.
#We need to linearize the matrix. SO we will flatten the matrix array such as [a11, a12, a21, a22]
#So for each row we combine the row's variables with each input vector.
X_mat = []
for in_n in range(0, n_of_vectors): #For each input vector
#populate all matrix flattened variables. for 2x2 matrix - 4 variables, for 3x3 - 9 variables and so on.
base = 0
for col_n in range(0, entry_len): #Each original unknown matrix's row must be matched to all entries in the input vector
row = [0 for i in range(0, entry_len ** 2)]
for entry in inputs[in_n]:
row[base] = entry
base+=1
X_mat.append(row)
Y_mat = [item for sublist in outputs for item in sublist]
X_np = np.array(X_mat)
Y_np = np.array([Y_mat]).T
solution = np.dot(np.dot(numpy.linalg.inv(np.dot(X_np.T,X_np)),X_np.T),Y_np)
var_mat = solution.reshape(entry_len, entry_len) #create square matrix
return var_mat
transf_mat = get_mat(INPUTS, OUTPUTS, 3) #3 means 3x3 matrix, and in/out vector size 3
print(transf_mat)
for i in range(0,INPUTS.__len__()):
o = np.dot(transf_mat, np.array([INPUTS[i]]).T)
print(f"{INPUTS[i]} x [M] = {o.T} ({OUTPUTS[i]})")
The output is as such:
[[ 0.13654096 0.35890767 0.09530002]
[ 0.31859558 0.83745124 0.22236671]
[ 0.08322497 -0.0526658 0.4417611 ]]
[5, 6, 2] x [M] = [[3.02675088 7.06241873 0.98365224]] ([3, 7, 1])
[1, 7, 3] x [M] = [[2.93479472 6.84785436 1.03984767]] ([3, 7, 1])
[2, 6, 5] x [M] = [[2.90302805 6.77373212 2.05926064]] ([3, 7, 2])
[1, 7, 5] x [M] = [[3.12539476 7.29258778 1.92336987]] ([3, 7, 2])
You can see, that it took all the specified inputs, got the transformed outputs and matched the outputs to the reference vectors. The results are not precise, since we have an approximation from the overspecified system. If we used INPUT and OUTPUT with only 3 vectors, the result would be exact.