Tensorflow Sparse Tensor operation too slow

Tensorflow Sparse Tensor operation too slow - tensorflow

I'm trying to convert a scipy sparse matrix to Tensorflow Sparse Tensor using the code below:
coo = norm_adj_mat.tocoo().astype(np.float32) ## norm_adj_mat is the scipy CSR matrix
indices = np.mat([coo.row, coo.col]).transpose()
A_tilde = tf.SparseTensor(indices, coo.data, coo.shape)
My original matrix is too large (>1 Million rows, cols) - the tensorflow conversion takes forever to convert it (>20 hours). I've tried it with a toy matrix and it seems to work fine for it. Any inputs on how to speed up this step?
I'm using tensorflow 2.9.1 & scipy 1.9.

Related

Working with sparse matrices in numpy and sklearn

I have a time series dataset geenrated from some electrophysiological data. I have a frequncy dataset and the matrix is quite sparse but huge, like it contains 0.005 s time bins for 2000+ neurons recorded over an hour so the matrix is huge. I am using this to train regressions in sklearn and I was wondering if there were ways to represent the matrix more efficiently to speed up my code? Tons of the data is taken up by 0 values.
Specifically I will be using these two functions on the matrix,
https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html
Will scipy.sparse run with sklearn functions, and will it train faster using scipy.sparse or other options?
https://docs.scipy.org/doc/scipy/reference/sparse.html

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

Why performance and speed of my tensorflow matrix factorization code is slower than numpy code?

I am unfamiliar tensorflow.
I am trying to replace matrix factorization numpy code with tensorflow. But not only the model performance but also the speed is worse than numpy. What's wrong with my code?
Tensorflow code : https://github.com/choco9966/Research/blob/master/Matrix%20Factorization/%5BTensorflow%5D%20MatrixFactorization.ipynb
Based numpy code : https://github.com/choco9966/Research/blob/master/Matrix%20Factorization/%5BNumpy%5D%20MatrixFactorization.ipynb

Construct adj matrix in Tensorflow with adj list

my model requires an adjacency matrix, which is currently created in numpy and passed to tensorflow as a placeholder.
With growing problem size, the I/O between Memory and VRAM is a bottleneck I suppose as the complexity is quadratic. For e.g. I use dim 400, which will result in 160.000 matrix values.
As the adj matrix is sparse, I thought about passing a adj list and then creating the adj matrix in tf on GPU.
Any suggestions?
Thanks

Tensorflow support sparse placeholder. In this page https://www.tensorflow.org/api_docs/python/tf/sparse_placeholder
there is an example showing how to use tf.sparse_placeholder

How to read SciPy sparse matrix into Tensorflow's placeholder

It's possible to read dense data by this way:
# tf - tensorflow, np - numpy, sess - session
m = np.ones((2, 3))
placeholder = tf.placeholder(tf.int32, shape=m.shape)
sess.run(placeholder, feed_dict={placeholder: m})
How to read scipy sparse matrix (for example scipy.sparse.csr_matrix) into tf.placeholder or maybe tf.sparse_placeholder ?

I think that currently TF does not have a good way to read from sparse data. If you do not want to convert a your sparse matrix into a dense one, you can try to construct a sparse tensor..
Here is what official tutorial tells you:
SparseTensors don't play well with queues. If you use SparseTensors
you have to decode the string records using tf.parse_example after
batching (instead of using tf.parse_single_example before batching).

To feed SciPy sparse matrix to TF placeholder
Option 1: you need to use tf.sparse_placeholder. In Use coo_matrix in TensorFlow shows the way to feed data to a sparse_placeholder
Option 2: you need to convert sparse matrix to NumPy dense matrix and feed to tf.place_holder (of course, this way is impossible when the converted dense matrix is out of memory)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Tensorflow Sparse Tensor operation too slow - tensorflow

Related

Working with sparse matrices in numpy and sklearn

Converting tensorflow dataset to numpy array

Why performance and speed of my tensorflow matrix factorization code is slower than numpy code?

Construct adj matrix in Tensorflow with adj list

How to read SciPy sparse matrix into Tensorflow's placeholder

Categories

Resources