Chi-Square distance in python between two matrix - chi-squared

I have two matrix with dimension 36608*1. I want to take a Chi-Square distance between two matrix I have search for Chi-Square method but its not working.

Try reshape them as 2*36608 matrix
If you are familiar with python,
you can use scipy.stats.chi2_contingency, this function will return Chi-Square distance.

Related

Calculate the inverse of a non-square matrix using numpy

Is there a way where I can calculate the inverse of a mxn non-square matrix using numpy? Since using la.inv(S) seems to give me an error of ValueError: expected square matrix
You are probably looking for np.linalg.pinv.
To calculate the non square matrix mxn, We can use np.linalg.pinv(S), here s is the data you want to pass.
For square matrix we use np.linalg.inv(S), The inverse of a matrix is such that if it is multiplied by the original matrix, it results in identity matrix.
note: np is numpy
We can also use np.linalg.inv(S) for non square matrix but in order to not get any error you need to slice the data S.
For more details on np.linalg.pinv : https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html

Implementation of Isotropic squared exponential kernel with numpy

I've come across a from scratch implementation for gaussian processes:
http://krasserm.github.io/2018/03/19/gaussian-processes/
There, the isotropic squared exponential kernel is implemented in numpy. It looks like:
The implementation is:
def kernel(X1, X2, l=1.0, sigma_f=1.0):
sqdist = np.sum(X1**2, 1).reshape(-1, 1) + np.sum(X2**2, 1) - 2 * np.dot(X1, X2.T)
return sigma_f**2 * np.exp(-0.5 / l**2 * sqdist)
consistent with the implementation of Nando de Freitas: https://www.cs.ubc.ca/~nando/540-2013/lectures/gp.py
However, I am not quite sure how this implementation matches the provided formula, especially in the sqdist part. In my opinion, it is wrong but it works (and delivers the same results as scipy's cdist with squared euclidean distance). Why do I think it is wrong? If you multiply out the multiplication of the two matrices, you get
which equals to either a scalar or a nxn matrix for a vector x_i, depending on whether you define x_i to be a column vector or not. The implementation however gives back a nx1 vector with the squared values.
I hope that anyone can shed light on this.
I found out: The implementation is correct. I just was not aware of the fuzzy notation (in my opinion) which is sometimes used in ML contexts. What is to be achieved is a distance matrix and each row vectors of matrix A are to be compared with the row vectors of matrix B to infer the covariance matrix, not (as I somehow guessed) the direct distance between two matrices/vectors.

how to do text clustering from cosine similarity

I am using WEKA for performing text collection. Suppose i have n documents with text, i calculated TFID as feature vector for each document and than calculated cosine similarity between each of each of the document.it generated nXn matrix. Now i wonder how to use this nxn matrix in k-mean algorithm . i know i can apply some dimension reduction such as MDS or PCA. What I am confused here is that after applying dimension reduction how will i identify that document itself, for example if i have 3 documents d1,d2 d3 than cosine will give me distances between d11,d12,d13
d21,d22,d23
d31,d32,d33
now i am not sure what will be output after PCA or MDS and how i will identify the documents after kmean. Please suggest. I hope i have put my question clearly
PCA is used on the raw data, not on distances, i.e. PCA(X).
MDS uses a distance function, i.e. MDS(X, cosine).
You appear to believe you need to run PCA(cosine(X))? That doesn't work.
You want to run MDS(X, cosine).

A Pure Pythonic Pairwise Euclidean distance of rows of a numpy ndarray

I have a matrix of size (n_classes, n_features) and i want to compute the pairwise euclidean distance of each pair of classes so the output would be a (n_classes, n_classes) matrix where each cell has the value of euclidean_distance(class_i, class_j).
I know that there is this scipy spatial distances (http://docs.scipy.org/doc/scipy-0.14.0/reference/spatial.distance.html) and sklearn.metric.euclidean distances (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html) but i want to use this in Theano software so i need a pure mathematical formula rather than functions that compute the results.
for example i need a series of transformations like A = X * B, D = X.T-X, results = D.T something that contains just matrix mathematical operations not functions.
You can do this using numpy broadcasting as shown in this gist. It should be straightforward to convert this to Theano code, or just reference #eickenberg's comment above, since he's the one who showed me how to do this!

Optimize Blas-like operation - A`*B*A

Given two matrices, A and B, where B is symetric (and positive semi-definite), What is the best (fastest) way to calculate A`*B*A?
Currently, using BLAS, I first compute C=B*A using dsymm (introducing a temporary matrix C) and then A`*C using dgemm.
Is there a better (faster, no temporaries) way to do this using BLAS and mkl?
Thanks.
I'll offer somekind of answer: Compared to the general case A*B*C you know that the end result is symmetric matrix. After computing C=B*A with BLAS subroutine dsymm, you want to compute A'C, but you only need to compute the upper diagonal part of the matrix and the copy the strictly upper diagonal part to the lower diagonal part.
Unfortunately there doesn't seem to be a BLAS routine where you can claim beforehand that given two general matrices, the output matrix will be symmetric. I'm not sure if it would be beneficial to write you own function for this. This probably depends on the size of your matrices and the implementation.
EDIT:
This idea seems to be addressed recently here: A Matrix Multiplication Routine that Updates Only the Upper or Lower Triangular Part of the Result Matrix