A Pure Pythonic Pairwise Euclidean distance of rows of a numpy ndarray - numpy

I have a matrix of size (n_classes, n_features) and i want to compute the pairwise euclidean distance of each pair of classes so the output would be a (n_classes, n_classes) matrix where each cell has the value of euclidean_distance(class_i, class_j).
I know that there is this scipy spatial distances (http://docs.scipy.org/doc/scipy-0.14.0/reference/spatial.distance.html) and sklearn.metric.euclidean distances (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html) but i want to use this in Theano software so i need a pure mathematical formula rather than functions that compute the results.
for example i need a series of transformations like A = X * B, D = X.T-X, results = D.T something that contains just matrix mathematical operations not functions.

You can do this using numpy broadcasting as shown in this gist. It should be straightforward to convert this to Theano code, or just reference #eickenberg's comment above, since he's the one who showed me how to do this!

Related

Calculate the inverse of a non-square matrix using numpy

Is there a way where I can calculate the inverse of a mxn non-square matrix using numpy? Since using la.inv(S) seems to give me an error of ValueError: expected square matrix
You are probably looking for np.linalg.pinv.
To calculate the non square matrix mxn, We can use np.linalg.pinv(S), here s is the data you want to pass.
For square matrix we use np.linalg.inv(S), The inverse of a matrix is such that if it is multiplied by the original matrix, it results in identity matrix.
note: np is numpy
We can also use np.linalg.inv(S) for non square matrix but in order to not get any error you need to slice the data S.
For more details on np.linalg.pinv : https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html

Implementation of Isotropic squared exponential kernel with numpy

I've come across a from scratch implementation for gaussian processes:
http://krasserm.github.io/2018/03/19/gaussian-processes/
There, the isotropic squared exponential kernel is implemented in numpy. It looks like:
The implementation is:
def kernel(X1, X2, l=1.0, sigma_f=1.0):
sqdist = np.sum(X1**2, 1).reshape(-1, 1) + np.sum(X2**2, 1) - 2 * np.dot(X1, X2.T)
return sigma_f**2 * np.exp(-0.5 / l**2 * sqdist)
consistent with the implementation of Nando de Freitas: https://www.cs.ubc.ca/~nando/540-2013/lectures/gp.py
However, I am not quite sure how this implementation matches the provided formula, especially in the sqdist part. In my opinion, it is wrong but it works (and delivers the same results as scipy's cdist with squared euclidean distance). Why do I think it is wrong? If you multiply out the multiplication of the two matrices, you get
which equals to either a scalar or a nxn matrix for a vector x_i, depending on whether you define x_i to be a column vector or not. The implementation however gives back a nx1 vector with the squared values.
I hope that anyone can shed light on this.
I found out: The implementation is correct. I just was not aware of the fuzzy notation (in my opinion) which is sometimes used in ML contexts. What is to be achieved is a distance matrix and each row vectors of matrix A are to be compared with the row vectors of matrix B to infer the covariance matrix, not (as I somehow guessed) the direct distance between two matrices/vectors.

Chi-Square distance in python between two matrix

I have two matrix with dimension 36608*1. I want to take a Chi-Square distance between two matrix I have search for Chi-Square method but its not working.
Try reshape them as 2*36608 matrix
If you are familiar with python,
you can use scipy.stats.chi2_contingency, this function will return Chi-Square distance.

Numpy sum over planes of 3d array, return a scalar

I'm making the transition from MATLAB to Numpy and feeling some growing pains.
I have a 3D array, lets say it's 3x3x3 and I want the scalar sum of each plane.
In matlab, I would use:
sum_vec = sum(3dArray,3);
TIA
wbg
EDIT: I was wrong about my matlab code. Matlab only vectorizes in one dim, so a loop wold be required. So numpy turns out to be more elegant...cool.
MATLAB
for i = 1:3
sum_vec(i) = sum(sum(3dArray(:,:,i));
end
You can do
sum_vec = np.array([plane.sum() for plane in cube])
or simply
sum_vec = cube.sum(-1).sum(-1)
where cube is your 3d array. You can specify 0 or 1 instead of -1 (or 2) depending on the orientation of the planes. The latter version is also better because it doesn't use a Python loop, which usually helps to improve performance when using numpy.
You should use the axis keyword in np.sum. Like in many other numpy functions, axis lets you perform the operation along a specific axis. For example, if you want to sum along the last dimension of the array, you would do:
import numpy as np
sum_vec = np.sum(3dArray, axis=-1)
And you'll get a resulting 2D array which corresponds to the sum along the last dimension to all the array slices 3dArray[i, k, :].
UPDATE
I didn't understand exactly what you wanted. You want to sum over two dimensions (a plane). In this case you can do two sums. For example, summing over the first two dimensions:
sum_vec = np.sum(np.sum(3dArray, axis=0), axis=0)
Instead of applying the same sum function twice, you may perform the sum on the reshaped array:
a = np.random.rand(10, 10, 10) # 3D array
b = a.view()
b.shape = (a.shape[0], -1)
c = np.sum(b, axis=1)
The above should be faster because you only sum once.
sumvec= np.sum(3DArray, axis=2)
or this works as well
sumvec=3DArray.sum(2)
Remember Python starts with 0 so axis=2 represent the 3rd dimension.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.sum.html
If you're trying to sum over a plane (and avoid loops, which is always a good idea) you can use np.sum and pass two axes as a tuple for your argument.
For example, if you have an (nx3x3) array then using
np.sum(a, (1,2))
Will give an (nx1x1), summing over a plane, not a single axis.

Vectorizing multiplication of matrices with different shapes in numpy/tensorflow

I have a 4x4 input matrix and I want to multiply every 2x2 slice with a weight stored in a 3x3 weight matrix. Please see the attached image for an example:
In the image, the colored section of the 4x4 input matrix is multiplied by the same colored section of the 3x3 weight matrix and stored in the 4x4 output matrix. When the slices overlap, the output takes the sum of the overlaps (e.g. the blue+red).
I am trying to perform this operation in Tensorflow 2.0 using eager tensors (which can be treated as numpy arrays). This is what I've written to perform this operation and it produces the expected output.
inputm = np.ones([4,4]) # initialize 4x4 input matrix
weightm = np.ones([3,3]) # initialize 3x3 weight matrix
outputm = np.zeros([4,4]) # initialize blank 4x4 output matrix
# iterate through each weight
for i in range(weightm.shape[0]):
for j in range(weightm.shape[1]):
outputm[i:i+2, j:j+2] += weightm[i,j] * inputm[i:i+2, j:j+2]
However, I don't think this is efficient since I am iterating through the weight matrix one-by-one, and this will be extremely slow when I need to perform this on large matrices of 500x500. I am having a hard time identifying a way to vectorize this operation, maybe tiling the weight matrix to be the same shape as the input matrix and performing a single matrix multiplication. I have also thought about flattening the matrix but I'm still not able to see a way to do this more efficiently.
Any advice will be much appreciated. Thanks in advance!
Alright, I think I have a solution but this involves using both numpy operations (e.g. np.repeat) and TensorFlow 2.0 operations (i.e. tf.segment_sum). And to warn you this is not the most clear elegant solution in the world, but it was the most elegant I could come up with. So here goes.
The main culprit in your problem is this weight matrix. If you manipulate this weight matrix to be a 4x4 matrix (with correct sum of weight at each position) you have a nice weight matrix which you can do an element-wise multiplication with the input. And that's my solution. Note that this is designed for the 4x4 problem and you should be able to relatively easily extend this to the 500x500 matrix.
import numpy as np
import tensorflow as tf
a = np.array([[1,2,3,4],[4,3,2,1],[1,2,3,4],[4,3,2,1]])
w = np.array([[5,4,3],[3,4,5],[5,4,3]])
# We make weights to a 6x6 matrix by repeating 2 times on both axis
w_rep = np.repeat(w,2,axis=0)
w_rep = np.repeat(w_rep,2,axis=1)
# Let's now jump in to tensorflow
tf_a = tf.constant(a)
tf_w = tf.constant(w_rep)
tf_segments = tf.constant([0,1,1,2,2,3])
# This is the most tricky bit, here we use the segment_sum to achieve what we need
# You can use segment_sum to get the sum of segments on the very first dimension of a matrix.
# So you need to do that to the input matrix twice. One on the original and the other on the transpose.
tf_w2 = tf.math.segment_sum(tf_w, tf_segments)
tf_w2 = tf.transpose(tf_w2)
tf_w2 = tf.math.segment_sum(tf_w2, tf_segments)
tf_w2 = tf.transpose(tf_w2)
print(tf_w2*a)
PS: I will try to include an illustration of what's going on here in a future edit. But I reckon that will take some time.
After realising #thushv89's trick, I realised you can get the same result by convolving the weight matrix with a matrix of ones:
import numpy as np
from scipy.signal import convolve2d
a = np.ones([4,4]) # initialize 4x4 input matrix
w = np.ones([3,3]) # initialize 3x3 weight matrix
b = np.multiply(a, convolve2d(w, np.ones((2,2))))
print(b)