Vectorize multivariate normal pdf python (PyTorch/NumPy) - numpy

I have N Gaussian distributions (multivariate) with N different means (covariance is the same for all of them) in D dimensions.
I also have N evaluation points, where I want to evaluate each of these (log) PDFs.
This means I need to get a NxN matrix, call it "kernels". That is, the (i,j)-th entry is the j-th Gaussian evaluated at the i-th point. A naive approach is:
from torch.distributions.multivariate_normal import MultivariateNormal
import numpy as np
# means contains all N means as rows and is thus N x D
# same for eval_points
# cov is not a problem , just a DxD matrix that is equal for all N Gaussians
kernels = np.empty((N,N))
for i in range(N):
for j in range(N):
kernels[i][j] = MultivariateNormal(means[j], cov).log_prob(eval_points[i])
Now one for loop we can get rid of easily, since for example if we wanted all the evaluations of the first Gaussian , we simply do:
MultivariateNormal(means[0], cov).log_prob(eval_points).squeeze()
and this gives us a N x 1 list of values, that is the first Gaussian evaluated at all N points.
My problem is that , in order to get the full N x N matrix , this doesn't work:
kernels = MultivariateNormal(means, cov).log_prob(eval_points).squeeze()
It doesn't figure out that it should evaluate each mean with all evaluation points in eval_points, and it doesn't return a NxN matrix with these which would be what I want. Therefore, I am not able to get rid of the second for loop, over all N Gaussians.

You are passing wrong shaped tensors to MultivariateNormal's constructor. You should pass a collection of mean vectors of shape (N, D) and a collection of precision matrix cov of shape (N, D, D) for N D-dimensional gaussian.
You are passing mu of shape (N, D) but your precision matrix is not well-shaped. You will need to repeat the precision matrix N number of times before passing it to the MultivariateNormal constructor. Here's one way to do it.
N = 10
D = 3
# means contains all N means as rows and is thus N x D
# same for eval_points
# cov is not a problem , just a DxD matrix that is equal for all N Gaussians
mu = torch.from_numpy(np.random.randn(N, D))
cov = torch.from_numpy(make_spd_matrix(D, D))
cov_n = cov[None, ...].repeat_interleave(N, 0)
assert cov_n.shape == (N, D, D)
kernels = MultivariateNormal(mu, cov_n)

Related

Vectorizing ARD (Automatic Relevance Determination) kernel implementation in Gaussian processes

I am trying to implement an ARD kernel with NumPy as given in the GPML book (M3 from Equation 5.2).
I am struggling in vectorizing this equation for NxM kernel computation. I have tried the following non-vectorized version. Can someone help in vectorizing this in NumPy/PyTorch?
import numpy as np
N = 30 # Number of data points in X1
M = 40 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
K = np.empty((N, M))
for n in range(N):
for m in range(M):
M3 = Lambda#Lambda.T + L_inv**2
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = sigma_f**2 * np.exp(-0.5 * d.T#M3#d)
We can use the rules of broadcasting and the neat NumPy function einsum to vectorize array operations. In few words, broadcasting allows us to operate with arrays in one-liners by adding new dimensions to the resulting array, while einsum allows us to perform operations with multiple arrays by explicitly working in the index notation (instead of matrices).
Luckily, no loops are necessary to calculate your kernel. Please see below the vectorized solution, ARD_kernel function, which is about 30x faster in my machine than the original loopy version. Now, einsum is usually as fast as it gets, but it's possible that there are faster methods though, I've not checked anything else (e.g. usual # operator instead of einsum).
Also, there is a missing term in the code (the Kronecker delta), I don't know if it was omitted in purpose (let me know if you have problems implementing it and I'll edit the answer).
import numpy as np
N = 300 # Number of data points in X1
M = 400 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
np.random.seed(1) # Fix random seed for reproducibility
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
# Loopy function
def ARD_kernel_loops(X1, X2, Lambda, L_inv, sigma_f):
K = np.empty((N, M))
M3 = Lambda#Lambda.T + L_inv**2
for n in range(N):
for m in range(M):
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = np.exp(-0.5 * d.T#M3#d)
return K * sigma_f**2
# Vectorized function
def ARD_kernel(X1, X2, Lambda, L_inv, sigma_f):
M3 = Lambda.squeeze()*Lambda + L_inv**2 # Use broadcasting to avoid transpose
d = X1[:,None] - X2[None,...] # Use broadcasting to avoid loops
# order=F for memory layout (as your arrays are (N,M,D) instead of (D,N,M))
return sigma_f**2 * np.exp(-0.5 * np.einsum("ijk,kl,ijl->ij", d, M3, d, order = 'F'))
There is perhaps an additional optimisation. The examples of the M matrices given are all positive definite. This means that the Cholesky decomposition can be applied, wo that we can find upper triangular U so that
M = U'*U
The point of this is that if we apply U to the xs, so
y[p] = U*x[p] p=1..
Then
(x[p]-x[q])'*M*(x[p]-x[q]) = (y[p]-y[q])'*(y[p]-y[q])
Thus if there are N vectors x each of dimension d,
we convert the N squared O(d squared) operations on the LHS to N squared O(d) operations on the RHS
This has cost an extra choleski decompositon (O(d cubed))
and N O( d squared) applications of U to the xs.

Tensormultiplication with einsum

I have a tensor phi = np.random.rand(n, n, 3) and a matrix D = np.random.rand(3, 3). I want to multiply the matrix D along the last axis of phi so that the output has shape (n, n, 3). I have tried this
np.einsum("klj,ij->kli", phi, D)
But I am not confident in this notation at all. Basically I want to do
res = np.zeros_like(phi)
for i in range(n):
for j in range(n):
res[i, j, :] = D.dot(phi[i, j, :])
You are treating phi as an n, n array of vectors, each of which is to be left-multiplied by D. So you want to keep the n, n portion of the shape exactly as-is. The last (only) dimension of the vectors should be multiplied and summed with the last dimension of the matrix (the vectors are implicitly 3x1):
np.einsum('ijk,lk->ijl', phi, D)
OR
np.einsum('ij,klj->kli', D, phi)
It's likely much simpler to use broadcasting with np.matmul (the # operator):
np.squeeze(D # phi[..., None])
You can omit the squeeze if you don't mind the extra unit dimension at the end.

Why is there a list after the covariance function for numpy?

Whenever I'm finding covariance of 2 arrays, I've always seen it done like
(np.cov(X,Y)[0,1])
What purpose does the [0,1] serve?
For two 1d arrays x and y, np.cov(x, y) returns:
np.array([[variance(x), covariance(x, y)],
[covariance(y, x), variance(y) ]])
Thus for the covariance, you need the [0,1] term.
When formulated as cov(x ,y), numpy creates np.cov(X) where X = np.stack(x, y, axis = 1)
The confusion occurs because for np.cov(X) is really optimized for many vectors at once, with X.shape = (m, n), and np.cov(X)[i,j], i, j < n to be the covariance between rows i and j. And the covariance of rows i and i is just the variance of row i.

Evaluating the pairwise euclidean distance between multi-dimensional inputs in TensorFlow

I have two 2-D tensors of shape say m X d and n X d. What is the optimized(i.e. without for loops) or the tensorflow way of evaluating the pairwise euclidean distance between these two tensors so that I get an output tensor of shape m X n. I need it for creating the squared term of a Gaussian kernel for ultimately having a covariance matrix of size m x n.
The equivalent unoptimized numpy code would look like this
difference_squared = np.zeros((x.shape[0], x_.shape[0]))
for row_iterator in range(difference_squared.shape[0]):
for column_iterator in range(difference_squared.shape[1]):
difference_squared[row_iterator, column_iterator] = np.sum(np.power(x[row_iterator]-x_[column_iterator], 2))
I found the answer by taking help from here. Assuming the two tensors are x1 and x2, and their dimensions are m X d and n X d, their pair-wise Euclidean distance is given by
tile_1 = tf.tile(tf.expand_dims(x1, 0), [n, 1, 1])
tile_2 = tf.tile(tf.expand_dims(x2, 1), [1, m, 1])
pairwise_euclidean_distance = tf.reduce_sum(tf.square(tf.subtract(tile_1, tile_2)), 2))

finding matrix through optimisation

I am looking for algorithm to solve the following problem :
I have two sets of vectors, and I want to find the matrix that best approximate the transformation from the input vectors to the output vectors.
vectors are 3x1, so matrix is 3x3.
This is the general problem. My particular problem is I have a set of RGB colors, and another set that contains the desired color. I am trying to find an RGB to RGB transformation that would give me colors closer to the desired ones.
There is correspondence between the input and output vectors, so computing an error function that should be minimized is the easy part. But how can I minimize this function ?
This is a classic linear algebra problem, the key phrase to search on is "multiple linear regression".
I've had to code some variation of this many times over the years. For example, code to calibrate a digitizer tablet or stylus touch-screen uses the same math.
Here's the math:
Let p be an input vector and q the corresponding output vector.
The transformation you want is a 3x3 matrix; call it A.
For a single input and output vector p and q, there is an error vector e
e = q - A x p
The square of the magnitude of the error is a scalar value:
eT x e = (q - A x p)T x (q - A x p)
(where the T operator is transpose).
What you really want to minimize is the sum of e values over the sets:
E = sum (e)
This minimum satisfies the matrix equation D = 0 where
D(i,j) = the partial derivative of E with respect to A(i,j)
Say you have N input and output vectors.
Your set of input 3-vectors is a 3xN matrix; call this matrix P.
The ith column of P is the ith input vector.
So is the set of output 3-vectors; call this matrix Q.
When you grind thru all of the algebra, the solution is
A = Q x PT x (P x PT) ^-1
(where ^-1 is the inverse operator -- sorry about no superscripts or subscripts)
Here's the algorithm:
Create the 3xN matrix P from the set of input vectors.
Create the 3xN matrix Q from the set of output vectors.
Matrix Multiply R = P x transpose (P)
Compute the inverseof R
Matrix Multiply A = Q x transpose(P) x inverse (R)
using the matrix multiplication and matrix inversion routines of your linear algebra library of choice.
However, a 3x3 affine transform matrix is capable of scaling and rotating the input vectors, but not doing any translation! This might not be general enough for your problem. It's usually a good idea to append a "1" on the end of each of the 3-vectors to make then a 4-vector, and look for the best 3x4 transform matrix that minimizes the error. This can't hurt; it can only lead to a better fit of the data.
You don't specify a language, but here's how I would approach the problem in Matlab.
v1 is a 3xn matrix, containing your input colors in vertical vectors
v2 is also a 3xn matrix containing your output colors
You want to solve the system
M*v1 = v2
M = v2*inv(v1)
However, v1 is not directly invertible, since it's not a square matrix. Matlab will solve this automatically with the mrdivide operation (M = v2/v1), where M is the best fit solution.
eg:
>> v1 = rand(3,10);
>> M = rand(3,3);
>> v2 = M * v1;
>> v2/v1 - M
ans =
1.0e-15 *
0.4510 0.4441 -0.5551
0.2220 0.1388 -0.3331
0.4441 0.2220 -0.4441
>> (v2 + randn(size(v2))*0.1)/v1 - M
ans =
0.0598 -0.1961 0.0931
-0.1684 0.0509 0.1465
-0.0931 -0.0009 0.0213
This gives a more language-agnostic solution on how to solve the problem.
Some linear algebra should be enough :
Write the average squared difference between inputs and outputs ( the sum of the squares of each difference between each input and output value ). I assume this as definition of "best approximate"
This is a quadratic function of your 9 unknown matrix coefficients.
To minimize it, derive it with respect to each of them.
You will get a linear system of 9 equations you have to solve to get the solution ( unique or a space variety depending on the input set )
When the difference function is not quadratic, you can do the same but you have to use an iterative method to solve the equation system.
This answer is better for beginners in my opinion:
Have the following scenario:
We don't know the matrix M, but we know the vector In and a corresponding output vector On. n can range from 3 and up.
If we had 3 input vectors and 3 output vectors (for 3x3 matrix), we could precisely compute the coefficients αr;c. This way we would have a fully specified system.
But we have more than 3 vectors and thus we have an overdetermined system of equations.
Let's write down these equations. Say that we have these vectors:
We know, that to get the vector On, we must perform matrix multiplication with vector In.In other words: M · I̅n = O̅n
If we expand this operation, we get (normal equations):
We do not know the alphas, but we know all the rest. In fact, there are 9 unknowns, but 12 equations. This is why the system is overdetermined. There are more equations than unknowns. We will approximate the unknowns using all the equations, and we will use the sum of squares to aggregate more equations into less unknowns.
So we will combine the above equations into a matrix form:
And with some least squares algebra magic (regression), we can solve for b̅:
This is what is happening behind that formula:
Transposing a matrix and multiplying it with its non-transposed part creates a square matrix, reduced to lower dimension ([12x9] · [9x12] = [9x9]).
Inverse of this result allows us to solve for b̅.
Multiplying vector y̅ with transposed x reduces the y̅ vector into lower [1x9] dimension. Then, by multiplying [9x9] inverse with [1x9] vector we solved the system for b̅.
Now, we take the [1x9] result vector and create a matrix from it. This is our approximated transformation matrix.
A python code:
import numpy as np
import numpy.linalg
INPUTS = [[5,6,2],[1,7,3],[2,6,5],[1,7,5]]
OUTPUTS = [[3,7,1],[3,7,1],[3,7,2],[3,7,2]]
def get_mat(inputs, outputs, entry_len):
n_of_vectors = inputs.__len__()
noe = n_of_vectors*entry_len# Number of equations
#We need to construct the input matrix.
#We need to linearize the matrix. SO we will flatten the matrix array such as [a11, a12, a21, a22]
#So for each row we combine the row's variables with each input vector.
X_mat = []
for in_n in range(0, n_of_vectors): #For each input vector
#populate all matrix flattened variables. for 2x2 matrix - 4 variables, for 3x3 - 9 variables and so on.
base = 0
for col_n in range(0, entry_len): #Each original unknown matrix's row must be matched to all entries in the input vector
row = [0 for i in range(0, entry_len ** 2)]
for entry in inputs[in_n]:
row[base] = entry
base+=1
X_mat.append(row)
Y_mat = [item for sublist in outputs for item in sublist]
X_np = np.array(X_mat)
Y_np = np.array([Y_mat]).T
solution = np.dot(np.dot(numpy.linalg.inv(np.dot(X_np.T,X_np)),X_np.T),Y_np)
var_mat = solution.reshape(entry_len, entry_len) #create square matrix
return var_mat
transf_mat = get_mat(INPUTS, OUTPUTS, 3) #3 means 3x3 matrix, and in/out vector size 3
print(transf_mat)
for i in range(0,INPUTS.__len__()):
o = np.dot(transf_mat, np.array([INPUTS[i]]).T)
print(f"{INPUTS[i]} x [M] = {o.T} ({OUTPUTS[i]})")
The output is as such:
[[ 0.13654096 0.35890767 0.09530002]
[ 0.31859558 0.83745124 0.22236671]
[ 0.08322497 -0.0526658 0.4417611 ]]
[5, 6, 2] x [M] = [[3.02675088 7.06241873 0.98365224]] ([3, 7, 1])
[1, 7, 3] x [M] = [[2.93479472 6.84785436 1.03984767]] ([3, 7, 1])
[2, 6, 5] x [M] = [[2.90302805 6.77373212 2.05926064]] ([3, 7, 2])
[1, 7, 5] x [M] = [[3.12539476 7.29258778 1.92336987]] ([3, 7, 2])
You can see, that it took all the specified inputs, got the transformed outputs and matched the outputs to the reference vectors. The results are not precise, since we have an approximation from the overspecified system. If we used INPUT and OUTPUT with only 3 vectors, the result would be exact.