Tensorflow Sparse Tensor operation too slow - tensorflow

I'm trying to convert a scipy sparse matrix to Tensorflow Sparse Tensor using the code below:
coo = norm_adj_mat.tocoo().astype(np.float32) ## norm_adj_mat is the scipy CSR matrix
indices = np.mat([coo.row, coo.col]).transpose()
A_tilde = tf.SparseTensor(indices, coo.data, coo.shape)
My original matrix is too large (>1 Million rows, cols) - the tensorflow conversion takes forever to convert it (>20 hours). I've tried it with a toy matrix and it seems to work fine for it. Any inputs on how to speed up this step?
I'm using tensorflow 2.9.1 & scipy 1.9.

Related

Working with sparse matrices in numpy and sklearn

I have a time series dataset geenrated from some electrophysiological data. I have a frequncy dataset and the matrix is quite sparse but huge, like it contains 0.005 s time bins for 2000+ neurons recorded over an hour so the matrix is huge. I am using this to train regressions in sklearn and I was wondering if there were ways to represent the matrix more efficiently to speed up my code? Tons of the data is taken up by 0 values.
Specifically I will be using these two functions on the matrix,
https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html
Will scipy.sparse run with sklearn functions, and will it train faster using scipy.sparse or other options?
https://docs.scipy.org/doc/scipy/reference/sparse.html

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

Why performance and speed of my tensorflow matrix factorization code is slower than numpy code?

I am unfamiliar tensorflow.
I am trying to replace matrix factorization numpy code with tensorflow. But not only the model performance but also the speed is worse than numpy. What's wrong with my code?
Tensorflow code : https://github.com/choco9966/Research/blob/master/Matrix%20Factorization/%5BTensorflow%5D%20MatrixFactorization.ipynb
Based numpy code : https://github.com/choco9966/Research/blob/master/Matrix%20Factorization/%5BNumpy%5D%20MatrixFactorization.ipynb

Construct adj matrix in Tensorflow with adj list

my model requires an adjacency matrix, which is currently created in numpy and passed to tensorflow as a placeholder.
With growing problem size, the I/O between Memory and VRAM is a bottleneck I suppose as the complexity is quadratic. For e.g. I use dim 400, which will result in 160.000 matrix values.
As the adj matrix is sparse, I thought about passing a adj list and then creating the adj matrix in tf on GPU.
Any suggestions?
Thanks
Tensorflow support sparse placeholder. In this page https://www.tensorflow.org/api_docs/python/tf/sparse_placeholder
there is an example showing how to use tf.sparse_placeholder

How to read SciPy sparse matrix into Tensorflow's placeholder

It's possible to read dense data by this way:
# tf - tensorflow, np - numpy, sess - session
m = np.ones((2, 3))
placeholder = tf.placeholder(tf.int32, shape=m.shape)
sess.run(placeholder, feed_dict={placeholder: m})
How to read scipy sparse matrix (for example scipy.sparse.csr_matrix) into tf.placeholder or maybe tf.sparse_placeholder ?
I think that currently TF does not have a good way to read from sparse data. If you do not want to convert a your sparse matrix into a dense one, you can try to construct a sparse tensor..
Here is what official tutorial tells you:
SparseTensors don't play well with queues. If you use SparseTensors
you have to decode the string records using tf.parse_example after
batching (instead of using tf.parse_single_example before batching).
To feed SciPy sparse matrix to TF placeholder
Option 1: you need to use tf.sparse_placeholder. In Use coo_matrix in TensorFlow shows the way to feed data to a sparse_placeholder
Option 2: you need to convert sparse matrix to NumPy dense matrix and feed to tf.place_holder (of course, this way is impossible when the converted dense matrix is out of memory)