How to read SciPy sparse matrix into Tensorflow's placeholder - tensorflow

It's possible to read dense data by this way:
# tf - tensorflow, np - numpy, sess - session
m = np.ones((2, 3))
placeholder = tf.placeholder(tf.int32, shape=m.shape)
sess.run(placeholder, feed_dict={placeholder: m})
How to read scipy sparse matrix (for example scipy.sparse.csr_matrix) into tf.placeholder or maybe tf.sparse_placeholder ?

I think that currently TF does not have a good way to read from sparse data. If you do not want to convert a your sparse matrix into a dense one, you can try to construct a sparse tensor..
Here is what official tutorial tells you:
SparseTensors don't play well with queues. If you use SparseTensors
you have to decode the string records using tf.parse_example after
batching (instead of using tf.parse_single_example before batching).

To feed SciPy sparse matrix to TF placeholder
Option 1: you need to use tf.sparse_placeholder. In Use coo_matrix in TensorFlow shows the way to feed data to a sparse_placeholder
Option 2: you need to convert sparse matrix to NumPy dense matrix and feed to tf.place_holder (of course, this way is impossible when the converted dense matrix is out of memory)

Related

How to dilate y_true inside a custom metric in keras/tensorflow?

I am trying to code a custom metric for U-net model implemented using keras/tensorflow. In the metric, I need to use the opencv function, 'cv2.dilate' on the ground truth. When I tried to use it, it gave the error as y_true is a tensor and cv2.dilate expects a numpy array.
Any idea on how to implement this?
I tried to convert tensor to numpy array but it is not working.
I searched for the tensorflow implementation of cv2.dilate but couldnt find one.
One possibility, if you are using a simple rectangular kernel in your dilation, is to use tf.nn.max_pool2d as a replacement.
import numpy as np
import tensorflow as tf
import cv2
image = np.random.random((28,28))
kernel_size = 3
# OpenCV dilation works with grayscale image, with H,W dimensions
dilated_cv = cv2.dilate(image, np.ones((kernel_size, kernel_size), np.uint8))
# TensorFlow maxpooling works with batch and channels: B,H,W,C dimenssions
image_w_batch_and_channels = image[None,...,None]
dilated_tf = tf.nn.max_pool2d(image_w_batch_and_channels, kernel_size, 1, "SAME")
# checking that the results are equal
np.allclose(dilated_cv, dilated_tf[0,...,0])
However, given that you mention that you are applying dilation on the ground truth, this dilation does not need to be differentiable. In that case, you can wrap your dilation in a tf.numpy_function
from functools import partial
# be sure to put the correct output type, tf.float64 is working in that specific case because numpy defaults to float64, but it might be different in your case
dilated_tf_npfunc = tf.numpy_function(
partial(cv2.dilate, kernel=np.ones((kernel_size, kernel_size), np.uint8)), [image]
)

Implement zigzag flatten NxN tensor with batches in TensorFlow

The problem can be described in zigzag scanning. However, I wonder if there's is a TensorFlow version of implementation by using something like tf.tensor_scatter_nd_update that TensorFlow suggests.
BxNxN tensor where B represents Batch.
I found a workaround by using 1x1 conv. Use numpy to generate a constant permutation conv kernel ( tf does not support eager tensor assignment... ), then
reshape tensor(BxNxN) to Bx1x1x(NxN) before applying tf.nn.conv2d to it. Finally do some reshape acrobat to flatten it.

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

What does the .numpy() function do?

I tried searching for the documentation online but I can't find anything that gives me an answer. What does .numpy() function do? The example code given is:
y_true = []
for X_batch, y_batch in mnist_test:
y_true.append(y_batch.numpy()[0].tolist())
Both in Pytorch and Tensorflow, the .numpy() method is pretty much straightforward. It converts a tensor object into an numpy.ndarray object. This implicitly means that the converted tensor will be now processed on the CPU.
Ever getting a problem understanding some PyTorch function you may ask help().
import torch
t = torch.tensor([1,2,3])
help(t.numpy)
Out:
Help on built-in function numpy:
numpy(...) method of torch.Tensor instance
numpy() -> numpy.ndarray
Returns :attr:`self` tensor as a NumPy :class:`ndarray`. This tensor and the
returned :class:`ndarray` share the same underlying storage. Changes to
:attr:`self` tensor will be reflected in the :class:`ndarray` and vice versa.
This numpy() function is the converter form torch.Tensor to numpy array.
If we look at this code below, we see a simple example where the .numpy() convert Tensors to numpy arrays automatically.
import numpy as np
ndarray = np.ones([3, 3])
print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)
print("And NumPy operations convert Tensors to numpy arrays automatically")
print(np.add(tensor, 1))
print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())
In the 2nd last line of code, we see that the tensorflow officials declared it as the converter of Tensor to a numpy array.
You may check it out here

Use coo_matrix in TensorFlow

I'm doing a Matrix Factorization in TensorFlow, I want to use coo_matrix from Spicy.sparse cause it uses less memory and it makes it easy to put all my data into my matrix for training data.
Is it possible to use coo_matrix to initialize a variable in tensorflow?
Or do I have to create a session and feed the data I got into tensorflow using sess.run() with feed_dict.
I hope that you understand my question and my problem otherwise comment and i will try to fix it.
The closest thing TensorFlow has to scipy.sparse.coo_matrix is tf.SparseTensor, which is the sparse equivalent of tf.Tensor. It will probably be easiest to feed a coo_matrix into your program.
A tf.SparseTensor is a slight generalization of COO matrices, where the tensor is represented as three dense tf.Tensor objects:
indices: An N x D matrix of tf.int64 values in which each row represents the coordinates of a non-zero value. N is the number of non-zeroes, and D is the rank of the equivalent dense tensor (2 in the case of a matrix).
values: A length-N vector of values, where element i is the value of the element whose coordinates are given on row i of indices.
dense_shape: A length-D vector of tf.int64, representing the shape of the equivalent dense tensor.
For example, you could use the following code, which uses tf.sparse_placeholder() to define a tf.SparseTensor that you can feed, and a tf.SparseTensorValue that represents the actual value being fed :
sparse_input = tf.sparse_placeholder(dtype=tf.float32, shape=[100, 100])
# ...
train_op = ...
coo_matrix = scipy.sparse.coo_matrix(...)
# Wrap `coo_matrix` in the `tf.SparseTensorValue` form that TensorFlow expects.
# SciPy stores the row and column coordinates as separate vectors, so we must
# stack and transpose them to make an indices matrix of the appropriate shape.
tf_coo_matrix = tf.SparseTensorValue(
indices=np.array([coo_matrix.rows, coo_matrix.cols]).T,
values=coo_matrix.data,
dense_shape=coo_matrix.shape)
Once you have converted your coo_matrix to a tf.SparseTensorValue, you can feed sparse_input with the tf.SparseTensorValue directly:
sess.run(train_op, feed_dict={sparse_input: tf_coo_matrix})