Most efficient way to add two CSR sparse matrices with the same sparsity pattern in python - numpy

I am using sparse matrices in python, namely
scipy.sparse.csr_matrix
I am in principle free to choose the exact sparse implementation, as long as the matrices support matrix-vector multiplication and addition/subtraction of matrices with the same sparsity pattern. Currently, at every time step, I construct a new sparse matrix from scratch and add it to the existing matrix. I believe that my code could be unnecessarily losing time due to
Construction time of sparse matrix
Addition of the sparse matrices, assuming that the underlying algorithm inside CSR matrix implementation has to find matching sparse entries before adding them up.
My guess would be that the sparse matrix is internally stored as a numpy array of values + a few index arrays denoting where those values are located. The question is if it is possible to directly add the underlying value arrays without touching the sparsity structure. Is something like this possible?
new_values = np.linspace(0, num_values)
csr_mat.val += new_values

Related

Get filtered co-occurrences from a sparse matrix

I have a scipy csr sparse matrix (a document-term matrix created with scikit-learn’s CountVectorizer).
The matrix is huge (78M tokens across 9.5M documents):
<9482138x78045191 sparse matrix of type '<class 'numpy.int64'>'
with 394161806 stored elements in Compressed Sparse Row format>
I want to reduce the matrix so that it contains only some of the tokens (columns). I have these in a list, and can extract them into a new matrix like this:
tokenarrays = []
for tokenid in tokenids_we_want[:3]:
tokenarray = dtm.getcol(tokenid).toarray()
tokenarrays.append(tokenarray)
Maybe there is a better way, using some kind of filtering method.
My end goal is a co-occurrence matrix of the wanted tokens. As a first step towards this, I have created a pandas dataframe using:
pd.DataFrame(zip(tokenids_we_want, tokenarrays))
This does not result in one pandas column per token in the tokenarray, however.
There is probably a better way altogether to do what I need:
Start from a sparse document-term matrix
Keep only some of the terms (tokens)
Get term co-occurrences

Is there any obvious reason that Tensorflow uses COO format other than CSR for sparse matrix?

I'm trying to take performance advantages from built-in Sparse Matrix Multiplication API of Tensorflow.
And keveman recommended that tf.embedding_lookup_sparse is the right way.
But, it seems that the performance of embedding_lookup_sparse is somewhat disappointed in my experiments. Though it performs fairly small matrix multiplications, <1, 3196> and <3196, 1024>, sparse matmul with 0.1 sparsity fails to win the dense matrix multiplication.
If my implementation is correct, I think one of the reasons is that Tensorflow uses COO format which saves all index-nonzero pair. I'm not an expert on this domain but, isn't it widely known that CSR format is more performant on this kind of computation? Is there any obvious reason that Tensorflow internally uses COO format other than CSR for sparse matrix representation?
Just for the record, you say matrix multiplication, but one of your matrices is in fact a vector (1 x 3196). So this would make it a matrix-vector multiplication (different BLAS kernel). I will assume you mean matrix-vector multiplication for my answer.
Yes, CSR should theoretically be faster than COO for matrix-vector multiplication; this is because the storage size in CSR format is O(2nnz + n) vs O(3nnzs) and the sparse matrix vector multiplication is in many cases memory bound.
The exact performance difference compared to a dense matrix multiplication varies though based on the problem size, sparsity pattern, data type and implementation. It is difficult to say off the bat which should be faster, because the sparse storage format introduces indirection, which potentially leads to reduced locality and poor(er) utilisation of arithmetic units (e.g. no use of vectorisation).
Particularly when the matrix and vector size are so small that almost everything fits in cache, I would expect limited performance benefits. Sparse matrix structures are typically more useful for truly large matrices, ranging from 10sK x 10sK to 1B x 1B, which wouldn't even fit in main memory using a dense representation. For small problem sizes, in my experience, the storage advantage compared to dense formats is usually negated by the loss in locality and arithmetic efficiency. To some extent this is addressed by hybrid storage formats (such as Block CSR) which try to take the best of both worlds, and are very useful for some applications (doesn't look like tensorflow supports this).
In tensorflow, I would assume the COO format is used because it is more efficient for other operations, for example it supports O(1) updates, insertions and deletions from the data structure. It seems reasonable to trade ~50% performance in sparse matrix-vector multiply to improve performance on these operations.

Avoiding multiplying zeros in matrix solver

Is there a way to avoid multiplying zeros as part of an inner a loop? As a laughable test I tried a conditional to stop the multiplication if it encounters a zero, and of course this is slower then just doing the multiplication. My preference is to leave the LU matrix intact, rather than rearrange to make zeros disappear (sparse). In this instance language is VBA prior to conversion to VB.net.
For k = 1 To i - 1
If LU(j, k) <> 0 and LU(k, i) <> 0 Then temp = temp - LU(j, k) * LU(k, i)
Next k
Thanks.
It is impossible to avoid multiplying by zeroes if you want to preserve the matrix structure.
Furthermore, sparse matrices are not supported in VBA so you would have to code your own class for sparse matrices : the idea is that instead of storing the entire matrix, you just store index/value pairs.
A sparse matrix class would include methods to :
create a matrix with given values in index/value form.
create a sparse matrix from given values in array form.
multiply two sparse matrices (including the special case sparse matrix times sparse vector)
Macroman, I mean to skip the calculation if zeros encountered to speed up solution. Thanks.
Titus, already wrote a fully pivoted LUD solver for VBA which can (slowly) solve sparse matrices. I just wanted to see if it was feasible to convert solver to Sparse techniques. Memory is not an issue, hence preference to avoid index/value storage technique, so I just wanted see if there was a fast way to make solver skip zeros to speed it up. Thanks.

Is it possible to build coo and csr matrices with numpy WITHOUT using scipy?

I have to operate on matrices using an equivalent of sicpy's sparse.coo_matrix and sparse.csr_matrix. However, I cannot use scipy (it is incompatible with the image analysis software I want to use this in). I can, however, use numpy.
Is there an easy way to accomplish what scipy.sparse.coo_matrix and scipy.sparse.csr_matrix do, with numpy only?
Thanks!
The attributes of a sparse.coo_matrix are:
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
Number of nonzero elements
data
COO format data array of the matrix
row
COO format row index array of the matrix
col
COO format column index array of the matrix
The data, row, col arrays are essentially the data, i, j parameters when defined with coo_matrix((data, (i, j)), [shape=(M, N)]). shape also comes from the definition. dtype from the data array. nzz as first approximation is the length of data (not accounting for zeros and duplicate coordinates).
So it is easy to construct a coo like object. Similarly a lil matrix has 2 lists of lists. And a dok matrix is a dictionary (see its .__class__.__mro__).
The data structure of a csr matrix is a bit more obscure:
data
CSR format data array of the matrix
indices
CSR format index array of the matrix
indptr
CSR format index pointer array of the matrix
It still has 3 arrays. And they can be derived from the coo arrays. But doing so with pure Python code won't be nearly as fast as the compiled scipy functions.
But these classes have a lot of functionality that would require a lot of work to duplicate. Some is pure Python, but critical pieces are compiled for speed. Particularly important are the mathematical operations that the csr_matrix implements, such as matrix multiplication.
Replicating the data structures for temporary storage is one thing; replicating the functionality is quite another.

Reshaped views in Parallel Colt

In numpy, there is a flatten operation which allows you to, for example, flatten a m x n matrix down to an array of mn elements, and a reshape operations which goes in the opposite direction. Much of the time this can be done with a view, without creating a copy of the original data.
Does such a capability exist in Parallel Colt, the Java matrix library? I have not been able to find one. There is a reshape method on one-dimensional matrices, but it appears to create copies.