I read an infomation in this book:
The matrix that is actually returned by TruncatedSVD is the dot product of the U andS matrices.
Then i try to just multiply U and Sigma:
US = U.dot(Sigma)
print("==>> US: ", US)
this time it produce the same result, just with sign flipping. So why Truncated SVD doesn't need multiplying VT ?
==========previous question===========
I am learning SVD, i found numpy and sklearn both provide some related APIs, then i try to use them to do dimensional reduction, below are the code:
import numpy as np
np.set_printoptions(precision=2, suppress=True)
A = np.array([
U, s, VT = np.linalg.svd(A)
print("==>> U: ", U)
print("==>> VT: ", VT)
# create m x n Sigma matrix
Sigma = np.zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
square_len = min((A.shape[0], A.shape[1]))
Sigma[:square_len, :square_len] = np.diag(s)
print("==>> Sigma: ", Sigma)
n_elements = 2
U = U[:, :n_elements]
Sigma = Sigma[:n_elements, :n_elements]
VT = VT[:n_elements, :n_elements]
# reconstruct
B = U.dot(Sigma.dot(VT))
print("==>> B: ", B)
The output B is :
==>> B: [[ 0.99 1.01]
[ 2.98 3.04]
[ 3.98 4.05]
[ 4.97 5.06]
[ 0.36 1.29]
[-0.37 0.73]
[ 0.18 0.65]]
then this is sklearn code:
import numpy as np
from sklearn.decomposition import TruncatedSVD
A = np.array([
svd = TruncatedSVD(n_components=2)
svd.fit(A) # Fit model on training data A
print("==>> right singular vectors: ", svd.components_)
print("==>> svd.singular_values_: ", svd.singular_values_)
B = svd.transform(A) # Perform dimensionality reduction on A.
print("==>> B: ", B)
its last output result is:
==>> B: [[ 1.72 -0.22]
[ 5.15 -0.67]
[ 6.87 -0.9 ]
[ 8.59 -1.12]
[ 1.91 5.62]
[ 0.9 6.95]
[ 0.95 2.81]]
As we can see, they produce different result (but i notice their singular values are the same, both are 12.48 9.51), how to make them same, does i misunderstand something ?
I think the correct way to perform a dimensionality reduction of the array A with np.linalg.svd is:
U, s, V = np.linalg.svd(A)
VT = V.T
B = A#VT[:,:n_elements]
Now B is:
array([[-1.72, 0.22],
[-5.15, 0.67],
[-6.87, 0.9 ],
[-8.59, 1.12],
[-1.91, -5.62],
[-0.9 , -6.95],
[-0.95, -2.81]])
That is exactly what you get from the TruncatedSVD, but with negative sign.
We have a matrix M with shape (n, n) and a matrix C with shape (n, n). In this example, both matrices are generate using numpy's np.random.normal(). We are encountering some unexpected behaviour in the associativity property of matrix multiplication in both numpy and tensorflow. This is causing my model (written in tensorflow) to give some unexpected errors when computing the Cholesky decomposition of covariance matrices.
Below is a simple working example of this (the model actually uses 3D and 4D tensors):
import numpy as np
import tensorflow as tf
# Generate random matrices sampled from normal distribution
M = np.random.normal(0, 0.1, size=(5, 5))
C = np.random.normal(0, 0.1, size=(5, 5))
First we check numpy output:
r1 = np.matmul(np.matmul(M, C),np.transpose(M))
r2 = np.matmul(M, np.matmul(C, np.transpose(M)))
r3 = (M # C) # np.transpose(M)
r4 = M # (C # np.transpose(M))
print("r1 == r2: ", np.all(r1 == r2))
print("r1 == r3: ", np.all(r1 == r3))
print("r1 == r4: ", np.all(r1 == r4))
print("r2 == r3: ", np.all(r2 == r3))
print("r2 == r4: ", np.all(r2 == r4))
print("r3 == r4: ", np.all(r3 == r4))
We get the following result:
Numpy's result.
The we check tensorflow output:
# Check tensorflow
t1 = tf.linalg.matmul(tf.linalg.matmul(M, C), M, transpose_b = True)
t2 = tf.linalg.matmul(M, tf.linalg.matmul(C, M, transpose_b = True))
print("t1 == t2: ", np.all(t1.numpy() == t2.numpy()))
print(t1.numpy() - t2.numpy())
We get the following result:
Tensorflow's results.
I found a similar thread describing a similar issue here, but with no answer.
Do you think that differences in the order of approximately e-19 can have such an impact that causes these weight matrices to not have a Cholesky decomposition?
Thanks in advance.
I found this fast script here in Stack Overflow for perform PCA with a given numpy array.
I don't know how to plot this in 3D, and also plot in 3D the Cumulative Explained Variances and the Number of Components. This fast script was perform with covariance method, and not with singular value decomposition, maybe that's the reason why I can't get my Cumulative Variances?
I tried to plotting with this, but it doesn't work.
This is the code and my output:
from numpy import array, dot, mean, std, empty, argsort
from numpy.linalg import eigh, solve
from numpy.random import randn
from matplotlib.pyplot import subplots, show
def cov(X):
Covariance matrix
note: specifically for mean-centered data
note: numpy's `cov` uses N-1 as normalization
return dot(X.T, X) / X.shape[0]
# N = data.shape[1]
# C = empty((N, N))
# for j in range(N):
# C[j, j] = mean(data[:, j] * data[:, j])
# for k in range(j + 1, N):
# C[j, k] = C[k, j] = mean(data[:, j] * data[:, k])
# return C
def pca(data, pc_count = None):
Principal component analysis using eigenvalues
note: this mean-centers and auto-scales the data (in-place)
data -= mean(data, 0)
data /= std(data, 0)
C = cov(data)
E, V = eigh(C)
key = argsort(E)[::-1][:pc_count]
E, V = E[key], V[:, key]
U = dot(data, V)
print(f'Eigen Values: {E}')
print(f'Eigen Vectors: {V}')
print(f'Key: {key}')
print(f'U: {U}')
print(f'shape: {U.shape}')
return U, E, V
data = dftransformed.transpose() # df tranpose and convert to numpy
trans = pca(data, 3)[0]
fig, (ax1, ax2) = subplots(1, 2)
ax1.scatter(data[:50, 0], data[:50, 1], c = 'r')
ax1.scatter(data[50:, 0], data[50:, 1], c = 'b')
ax2.scatter(trans[:50, 0], trans[:50, 1], c = 'r')
ax2.scatter(trans[50:, 0], trans[50:, 1], c = 'b')
I understand the eigen values & eigen vectors, but I can't understand this key value, the user didn't comment this section of code in the answer, anyone knows what means each variable printed?
Eigen Values: [126.30390621 68.48966957 26.03124927]
Eigen Vectors: [[-0.05998409 0.05852607 -0.03437937]
[ 0.00807487 0.00157143 -0.12352761]
[-0.00341751 0.03819162 0.08697668]
[-0.0210582 0.06601974 -0.04013712]
[-0.03558994 0.02953385 0.01885872]
[-0.06728424 -0.04162485 -0.01508154]]
Key: [439 438 437]
U: [[-12.70954048 8.97405411 -2.79812235]
[ -4.90853527 4.36517107 0.54129243]
[ -2.49370123 0.48341147 7.26682759]
[-16.07860635 6.16100749 5.81777637]
[ -1.81893291 6.48443689 -5.8655646 ]
[ 9.03939039 2.64196391 4.22056618]
[-14.71731064 9.19532016 -2.79275543]
[ 1.60998654 8.37866823 0.86207034]
[ -4.4503797 10.12688097 -5.12453656]
[ 12.16293556 2.2594413 -2.11730311]
[-15.76505125 9.48537581 -2.73906772]
[ -2.54289959 9.86768111 -4.84802992]
[ -5.78214902 9.21901651 -8.13594627]
[ -1.35428398 5.85550586 6.30553987]
[ 12.87261987 0.96283606 -3.26982121]
[ 24.57767477 -4.28214631 6.29510659]
[ 4.13941679 3.3688288 3.01194055]
[ -2.98318764 1.32775227 7.62610929]
[ -4.44461549 -1.49258339 1.39080386]
[ -0.10590795 -0.3313904 8.46363066]
[ 6.05960739 1.03091753 5.10875657]
[-21.27737352 -3.44453629 3.25115921]
[ -1.1183025 0.55238687 10.75611405]
[-10.6359291 7.58630341 -0.55088259]
[ 4.52557492 -8.05670864 2.23113833]
[-11.07822559 1.50970501 4.66555889]
[ -6.89542628 -19.24672805 -3.71322812]
[ -0.57831362 -17.84956249 -5.52002876]
[-12.70262277 -14.05542691 -2.72417438]
[ -7.50263129 -15.83723295 -3.2635125 ]
[ -7.52780216 -17.60790567 -2.00134852]
[ -5.34422731 -17.29394266 -2.69261597]
[ 9.40597893 0.21140292 2.05522806]
[ 12.12423431 -2.80281266 7.81182024]
[ 19.51224195 4.7624575 -11.20523383]
[ 22.38102384 0.82486072 -1.64716468]
[ -8.60947699 4.12597477 -6.01885407]
[ 9.56268414 1.18190655 -5.44074124]
[ 14.97675455 3.31666971 -3.30012109]
[ 20.47530869 -1.95896058 -1.91238615]]
shape: (40, 3)
trans = pca(data, 3)[0] is the U data, since [0] selects the first index of the returned data, and pca returns U, E, V
ax2.scatter(trans[:50, 0], trans[:50, 1], c = 'r') plots the first 50 rows of column 0 against the first 50 rows of column 1, and ax2.scatter(trans[50:, 0], trans[50:, 1], c = 'b') does the same for rows from 50 to the end. This from the sample data given in this fast script, but your data only has shape: (40, 3) (e.g. only 40 rows of data).
In order to plot trans as a 3d scatter plot, extract each of the 3 columns into a separate variable and plot as a scatter plot.
# imports as shown in the linked answer
from numpy import array, dot, mean, std, empty, argsort
from numpy.linalg import eigh, solve
from numpy.random import randn
from matplotlib.pyplot import subplots, show
# other imports
import numpy as np
# test data from linked answer (e.g. this fast script)
np.random.seed(365) # makes data repeatable
data = array([randn(8) for k in range(150)]) # creates array with shape (150, 8)
data[:50, 2:4] += 5 # adds 5 to first 50 rows of columns 2:4
data[50:, 2:5] += 5 # adds 5 to to rows from 50 of columns 2:5
# function call
trans = pca(data, 3)[0] # [0] gets U returned by pca(...)
# extract each column to a separate variable
x = trans[:, 0] # all rows of column 0
y = trans[:, 1] # all rows of column 1
z = trans[:, 2] # all rows of column 2
# plot 3d scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)
I am trying to implement a PyTorch covariance matrix operator. However, I notice the results are not the same between the Numpy implementation and my attempt, yet I do not understand why.
I define the Bessel-corrected weighted covariance matrix as:
I define the weighted mean as:
I compare the NumPy method and my method as follows:
import numpy as np
import torch
x = np.random.randn(1000, 3)*1000
w = np.abs(np.random.randn(1000))*1000
x_torch = torch.DoubleTensor(x)
w_torch = torch.DoubleTensor(w)
#calculate weighted means
m_w = torch.sum(x_torch.T*w_torch, axis=1)/torch.sum(w_torch)
m_w_np = np.average(x, axis=0, weights=w)
#calculate weighted covariance matrix
Q = (x_torch-m_w).T
cov_w = (1.0 / (torch.sum(w_torch) - 1))*(w_torch*Q).mm(Q.T)
cov_w_np = np.cov(x.T, aweights=w.T)
print("NUMPY = {0}\n\nTORCH = {1}\n\nDIFFERENCE={2}".format(m_w_np, m_w.numpy(), m_w_np-m_w.numpy()))
print("NUMPY = {0}\n\nTORCH = {1}\n\nDIFFERENCE={2}".format(cov_w_np, cov_w.numpy(),cov_w_np-cov_w.numpy()))
This yields the following output:
NUMPY = [-21.10537208 -7.70801723 64.4034329 ]
TORCH = [-21.10537208 -7.70801723 64.4034329 ]
DIFFERENCE=[-7.10542736e-15 -1.77635684e-15 1.42108547e-14]
NUMPY = [[ 989468.17457696 13620.54885133 10723.87790683]
[ 13620.54885133 953966.92486133 21407.69378841]
[ 10723.87790683 21407.69378841 1019646.81044077]]
TORCH = [[ 987952.51042915 13599.68493868 10707.45110536]
[ 13599.68493868 952505.64141296 21374.90155234]
[ 10707.45110536 21374.90155234 1018084.91875621]]
DIFFERENCE=[[1515.6641478 20.86391265 16.42680147]
[ 20.86391265 1461.28344838 32.79223607]
[ 16.42680147 32.79223607 1561.89168456]]
I want to solve the following non-linear system of equations.
the dot between a_k and x represents dot product.
the 0 in the first equation represents 0 vector and 0 in the second equation is scaler 0
all the matrices are sparse if that matters.
K is an n x n (positive definite) matrix
each A_k is a known (symmetric) matrix
each a_k is a known n x 1 vector
N is known (let's say N = 50). But I need a method where I can easily change N.
Unknown (trying to solve for)
x is an n x 1 a vector.
each alpha_k for 1 <= k <= N a scaler
My thinking.
I am thinking of using scipy root to find x and each alpha_k. We essentially have n equations from each row of the first equation and another N equations from the constraint equations to solve for our n + N variables. Therefore we have the required number of equations to have a solution.
I also have a reliable initial guess for x and the alpha_k's.
Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
We are trying to solve for
x = [x1, x2, x3, x4] and alpha_1, alpha_2
I can actually brute force this toy problem and feed it to the solver. But how do I do I solve this toy problem in such a way that I can extend it easily to the case when I have let's say n=50 and N=50
I will probably have to explicitly compute the Jacobian for larger matrices??.
Can anyone give me any pointers?
I think the scipy.optimize.root approach holds water, but steering clear of the trivial solution might be the real challenge for this system of equations.
In any event, this function uses root to solve the system of equations.
def solver(x0, alpha0, K, A, a):
x0 - nx1 numpy array. Initial guess on x.
alpha0 - nx1 numpy array. Initial guess on alpha.
K - nxn numpy.array.
A - Length N List of nxn numpy.arrays.
a - Length N list of nx1 numpy.arrays.
# Establish the function that produces the rhs of the system of equations.
n = K.shape[0]
N = len(A)
def lhs(x_alpha):
x_alpha is a concatenation of x and alpha.
x = np.ravel(x_alpha[:n])
alpha = np.ravel(x_alpha[n:])
lhs_top = np.ravel(K.dot(x))
for k in xrange(N):
lhs_top += alpha[k]*(np.ravel(np.dot(A[k], x)) + np.ravel(a[k]))
lhs_bottom = [0.5*x.dot(np.ravel(A[k].dot(x))) + np.ravel(a[k]).dot(x)
for k in xrange(N)]
lhs = np.array(lhs_top.tolist() + lhs_bottom)
return lhs
# Solve the system of equations.
x0.shape = (n, 1)
alpha0.shape = (N, 1)
x_alpha_0 = np.vstack((x0, alpha0))
sol = root(lhs, x_alpha_0)
x_alpha_root = sol['x']
# Compute norm of residual.
res = sol['fun']
res_norm = np.linalg.norm(res)
# Break out the x and alpha components.
x_root = x_alpha_root[:n]
alpha_root = x_alpha_root[n:]
return x_root, alpha_root, res_norm
Running on the toy example, however, only produces the trivial solution.
# Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
A = [A_1, A_2]
a = [a_1, a_2]
x0 = scipy.rand(n, 1)
alpha0 = scipy.rand(N, 1)
print 'x0 =', x0
print 'alpha0 =', alpha0
x_root, alpha_root, res_norm = solver(x0, alpha0, K, A, a)
print 'x_root =', x_root
print 'alpha_root =', alpha_root
print 'res_norm =', res_norm
Output is
x0 = [[ 0.00764503]
[ 0.08058471]
[ 0.88300129]
[ 0.85299622]]
alpha0 = [[ 0.67872815]
[ 0.69693346]]
x_root = [ 9.88131292e-324 -4.94065646e-324 0.00000000e+000
alpha_root = [ -4.94065646e-324 0.00000000e+000]
res_norm = 0.0
X is an n by d matrix, W is an m by d matrix, for every row in X I want to compute the squared Euclidean distance with every row in W, so the results will be an n by m matrix.
If there's only one row in W, this is easy
x = tensor.TensorType("float64", [False, False])()
w = tensor.TensorType("float64", [False])()
z = tensor.sum((x-w)**2, axis=1)
fn = theano.function([x, w], z)
print fn([[1,2,3], [2,2,2]], [2,2,2])
# [ 2. 0.]
What do I do when W is a matrix (in Theano)?
Short answer, use scipy.spatial.distance.cdist
Long answer, if you don't have scipy, is to broadcast subtract and then norm by axis 0.
np.linalg.norm(X[:,:,None]-W[:,None,:], axis=0)
Really long answer, of you have an ancient version of numpy without a vecorizable linalg.norm (i.e. you're using Abaqus) is
np.sum((X[:,:,None]-W[:,None,:])**2, axis=0).__pow__(0.5)
Edit by OP
In Theano we can make X and W both 3d matrices and make the corresponding axes broadcastable like
x = tensor.TensorType("float64", [False, True, False])()
w = tensor.TensorType("float64", [True, False, False])()
z = tensor.sum((x-w)**2, axis=2)
fn = theano.function([x, w], z)
print fn([[[0,1,2]], [[1,2,3]]], [[[1,1,1], [2,2,2]]])
# [[ 2. 5.]
# [ 5. 2.]]
Luckily the the number of rows in W can be known in advance, so I'm temporally doing
x = tensor.TensorType("float64", [False, False])()
m = 2
w = tensor.as_tensor([[2,2,2],[1,2,3]])
res_list = []
for i in range(m):
res_list.append(ten.sum((x-w[i,:])**2, axis=1))
z = tensor.stack(res_list)
fn = theano.function([x], z)
print fn([[1,2,3], [2,2,2], [2,3,4]])
# [[ 2. 0. 5.]
# [ 0. 2. 3.]]
Other answers are welcome!