I've got an array of time series data of shape (2466, 2498, 9) ((asset, date, feature)).
I've got 9 features, on which I want to do PCA to reduce the dimensionality on this axis.
I'm struggling to calculate the covariance matrix, Z = X.T # X.
I think I want to express this as an einsum, but I'm not sure how. I'm certainly interested in other methods as well, as the purpose of this is to learn numpy, rather than actually solve a problem.
Edit: This is my (apparently wrong) attempt so far:
np.einsum('ijk,ijl->ijkl',myData, myData)`
(This just hangs my system.)
Edit 2:
I've come to understand that I should be using np.linalg.svd for this problem.
I'm trying to use linalg to find $P^{500}$ where $ P$ is a 9x9 matrix but Python displays the following:
Matrix full of inf
I think this is too much for this method so my question is, there is annother library to find $P^{500}$? Must I surrender?
Thank you all in advance
Use the eigendecomposition and then exponentiate the matrix of eigenvalues. Like this. You end up getting an inf up in the first column. Unless you control the type of matrix by their eigenvalues this won't happen I believe. In other words, your eigenvalues have to be bounded. You can generate a random matrix by the Schur decomposition putting the eigenvalues along the diagonal. This is a post I have about generating a matrix with given eigenvalues. This should be the way that method works anyways.
% Generate random 9x9 matrix
n=9;
A = randn(n);
[V,D] = eig(A);
p = 500;
Dp = D^p;
Ap = V^(-1)*Dp*V;
Ap1 = mpower(A,p);
NumPy arrays have homogeneous data types and float datatype maximum is
>>> np.finfo('d').max
1.7976931348623157e+308
>>> _**0.002
4.135322944991858
>>> np.array(4.135)**500
1.7288485271474026e+308
>>> np.array(4.136)**500
__main__:1: RuntimeWarning: overflow encountered in power
inf
So if there is an inner product that results higher than approx. 4.135 it is going to blowup and once it blows up, the next product will be multiplied with infinities and more entries will get infinities until everything becomes infinities.
Metahominid's suggestion certainly helps but it will not solve the issue if your eigenvalues are larger than this value. In general, you need to use specialized high-precision tools to get correct results.
I have the following equation:
Where M is a [Dx3] matrix and V is a [DxD] matrix. Each of these forms a [3x3] block in a larger [3Kx3K] matrix, indexed by i, j. For now, I'm wondering if anyone has come across doing a reduce-sum of this form in TensorFlow - I'm still getting used to the API structure!
I'm trying to solve a large eigenvalue problem with Scipy where the matrix A is dense but I can compute its action on a vector without having to assemble A explicitly. So in order to avoid memory issues when the matrix A gets big I'd like to use the sparse solver scipy.sparse.linalg.eigs with a LinearOperator that implemements this action.
Applying eigs to an explicit numpy array A works fine. However, if I apply eigs to a LinearOperator instead then the iterative solver fails to converge. This is true even if the matvec method of the LinearOperator is simply matrix-vector multiplication with the given matrix A.
A minimal example illustrating the failure is attached below (I'm using shift-invert mode because I am interested in the smallest few eigenvalues). This computes the eigenvalues of a random matrix A just fine, but fails when applied to a LinearOperator that is directly converted from A. I tried to fiddle with the parameters for the iterative solver (v0, ncv, maxiter) but to no avail.
Am I missing something obvious? Is there a way to make this work? Any suggestions would be highly appreciated. Many thanks!
Edit: I should clarify what I mean by "make this work" (thanks, Dietrich). The example below uses a random matrix for illustration. However, in my application I know that the eigenvalues are almost purely imaginary (or almost purely real if I multiply the matrix by 1j). I'm interested in the 10-20 smallest-magnitude eigenvalues, but the algorithm doesn't behave well (i.e., never stops even for small-ish matrix sizes) if I specify which='SM'. Therefore I'm using shift-invert mode by passing the parameters sigma=0.0, which='LM'. I'm happy to try a different approach so long as it allows me to compute a bunch of smallest-magnitude eigenvalues.
from scipy.sparse.linalg import eigs, LinearOperator, aslinearoperator
import numpy as np
# Set a seed for reproducibility
np.random.seed(0)
# Size of the matrix
N = 100
# Generate a random matrix of size N x N
# and compute its eigenvalues
A = np.random.random_sample((N, N))
eigvals = eigs(A, sigma=0.0, which='LM', return_eigenvectors=False)
print eigvals
# Convert the matrix to a LinearOperator
A_op = aslinearoperator(A)
# Try to solve the same eigenproblem again.
# This time it produces an error:
#
# ValueError: Error in inverting M: function gmres did not converge (info = 1000).
eigvals2 = eigs(A_op, sigma=0.0, which='LM', return_eigenvectors=False)
I tried running your code, but not passing the sigma parameter to eigs() and it ran without problems (read eigs() docs for its meaning). I didn't see the benefit of it in your example.
Eigs can already find the smallest eigenvalues first. Set which = 'SM'
NOTE:
Speed is not as important as getting a final result.
However, some speed up over worst case is required as well.
I have a large array A:
A.shape=(20000,265) # or possibly larger like 50,000 x 265
I need to compute the correlation coefficients.
np.corrcoeff # internally casts the results as doubles
I just borrowed their code and wrote my own cov/corr not casting into doubles, since I really only need 32 bit floats.And I ditch the conj() since my data are always real.
cov = A.dot(A.T)/n #where A is an array of 32 bit floats
diag = np.diag(cov)
corr = cov / np.sqrt(np.mutliply.outer(d,d))
I still run out of memory and I'm using a large memory machine, 264GB
I've been told, that the fast C libraries, are probably using a routine which breaks the
dot product up into pieces, and to optimize this, the number of elements is padded to a power of 2.
I don't really need to compute the symmetric half of the correlation coefficient matrix.
However, I don't see a way to do this in reasonable amount of time doing it "manually", with python loops.
Does anybody know of a way to ask numpy for a decent dot product routine, that balances memory usage with speed...?
Cheers
UPDATE:
Funny how writing these questions tends to help me find the language for a better google query.
Found this:
http://wiki.scipy.org/PerformanceTips
Not sure that I follow it....so, please comment or provide answers about this solution, your own ideas, or just general commentary on this type of problem.
TIA
EDIT: I apologize because my array is really much bigger than I thought.
array size is actually 151,000 x 265
I''m running out of memory on a machine with 264 GB with at least 230 GB free.
I'm surprised that the numpy call to blas dgemm and being careful with C order arrays
didn't do squat.
Python compiled with intel's mkl will run this with 12GB of memory in about 30 seconds:
>>> A = np.random.rand(50000,265).astype(np.float32)
>>> A.dot(A.T)
array([[ 86.54410553, 64.25226593, 67.24698639, ..., 68.5118103 ,
64.57299805, 66.69223785],
...,
[ 66.69223785, 62.01016235, 67.35866547, ..., 66.66306305,
65.75863647, 86.3017807 ]], dtype=float32)
If you do not have access to in intel's MKL download python anaconda and install the accelerate package which has a trial version for 30 days or free for academics that contains a mkl compile. Various other C++ BLAS libraries should work also- even if it copies the array from C to F it should not take more then ~30GB of memory.
The only thing that I can think of that your installation is trying to do is try to hold the entire 50,000 x 50,000 x 265 array in memory which is quite frankly terrible. For reference a float32 50,000 x 50,000 array is only 10GB, while the aforementioned array is 2.6TB...
If its a gemm issue you can try a chunk gemm formula:
def chunk_gemm(A, B, csize):
out = np.empty((A.shape[0],B.shape[1]), dtype=A.dtype)
for i in xrange(0, A.shape[0], csize):
iend = i+csize
for j in xrange(0, B.shape[1], csize):
jend = j+csize
out[i:iend, j:jend] = np.dot(A[i:iend], B[:,j:jend])
return out
This will be slower, but will hopefully get over your memory issues.
You can try and see if np.einsum works better than dot for your case:
cov = np.einsum('ij,kj->ik', A, A) / n
The internal workings of dot are a little obscure, as it tries to use BLAS optimized routines, which sometimes require copies of arrays to be in Fortran order, not sure if that's the case here. einsum will buffer its inputs, and use vectorized SIMD operations where possible, but outside that it is basically going to run the naive three nested loops to compute the matrix product.
UPDATE: Turns out the dot product completed with out error, but upon careful inspection
the output array consists of zeros at 95,000 to the end, of the 151,000 cols.
That is, out[:,94999] = non-zero but out[:,95000] = 0 for all rows...
This is super annoying...
Another Blas description
The exchange, mentions something that I thought about too...Since blas is fortran, shouldn't
the order of the input be F order...? Where as the scipy doc page below, says C order.
Trying F order caused a segmentation fault. So I'm back to square one.
ORIGINAL POST
I finally tracked down my problem, which was in the details as usual.
I'm using an array of np.float32 which were stored as F order. I can't control the F order to my knowledge, since the data is loaded from images using an imaging library.
import scipy
roi = np.ascontiguousarray( roi )# see roi.flags below
out = scipy.linalg.blas.sgemm(alpha=1.0, a=roi, b=roi, trans_b=True)
This level 3 blas routine does the trick. My problem was two fold:
roi.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
And... i was using blas dgemm NOT sgemm. The 'd' is for 'double' and 's' for 'single'.
See this pdf: BLAS summary pdf
I looked at it once and was overwhelmed...I went back and read the wikipedia article on blas routines to understand level 3 vs other levels: wikipedia article on blas
Now it works on A = 150,000 x 265, performing:
A \dot A.T
Thanks everyone for your thoughts...knowing that it could be done was most important.