Pytorch: Numpy Arrays - numpy

Can I use numpy arrays when using pytorch?
I am converting a code from tensorflow to pytorch and the code uses numpy arrays during the computation. Can I keep my inputs as numpy arrays during the computation or do I have to convert them to torch tensors?

If that array is being passed to a Pytorch model with pytorch nn layers, then it MUST be a <torch.tensor> and NOT a numpy array.
Depending on the Pytorch layer, the tensor has to be in a specific shape like for nn.Conv2d layers you must have a 4d torch tensor and for nn.Linear you must have a 2d torch tensor.
This is among many reasons, it cannot be a numpy array.
Sarthak

Related

Theano to tensorflow conversion

What is the tensorflow/keras equivalent of Theano's gt?
Based on Theano's documentation, theano.tensor.gt returns a symbolic 'int8' tensor representing the result of logical greater-than (a>b).
In Theano, I have:
import theano.tensor as T
posInd1 = T.gt(D1, eps).nonzero()[0]
How do I convert it to TensorFlow?

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

What does the .numpy() function do?

I tried searching for the documentation online but I can't find anything that gives me an answer. What does .numpy() function do? The example code given is:
y_true = []
for X_batch, y_batch in mnist_test:
y_true.append(y_batch.numpy()[0].tolist())
Both in Pytorch and Tensorflow, the .numpy() method is pretty much straightforward. It converts a tensor object into an numpy.ndarray object. This implicitly means that the converted tensor will be now processed on the CPU.
Ever getting a problem understanding some PyTorch function you may ask help().
import torch
t = torch.tensor([1,2,3])
help(t.numpy)
Out:
Help on built-in function numpy:
numpy(...) method of torch.Tensor instance
numpy() -> numpy.ndarray
Returns :attr:`self` tensor as a NumPy :class:`ndarray`. This tensor and the
returned :class:`ndarray` share the same underlying storage. Changes to
:attr:`self` tensor will be reflected in the :class:`ndarray` and vice versa.
This numpy() function is the converter form torch.Tensor to numpy array.
If we look at this code below, we see a simple example where the .numpy() convert Tensors to numpy arrays automatically.
import numpy as np
ndarray = np.ones([3, 3])
print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)
print("And NumPy operations convert Tensors to numpy arrays automatically")
print(np.add(tensor, 1))
print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())
In the 2nd last line of code, we see that the tensorflow officials declared it as the converter of Tensor to a numpy array.
You may check it out here

tensorflow matrix_band_part function equivalent in pytorch

I need to create upper triangular masking tensor in pytorch. in tensorflow it's easy using matrix_band_part. is it any pytorch equivalent for this function?
it's like numpy triu_indices but I need for tensor, not just matrix.

How to read SciPy sparse matrix into Tensorflow's placeholder

It's possible to read dense data by this way:
# tf - tensorflow, np - numpy, sess - session
m = np.ones((2, 3))
placeholder = tf.placeholder(tf.int32, shape=m.shape)
sess.run(placeholder, feed_dict={placeholder: m})
How to read scipy sparse matrix (for example scipy.sparse.csr_matrix) into tf.placeholder or maybe tf.sparse_placeholder ?
I think that currently TF does not have a good way to read from sparse data. If you do not want to convert a your sparse matrix into a dense one, you can try to construct a sparse tensor..
Here is what official tutorial tells you:
SparseTensors don't play well with queues. If you use SparseTensors
you have to decode the string records using tf.parse_example after
batching (instead of using tf.parse_single_example before batching).
To feed SciPy sparse matrix to TF placeholder
Option 1: you need to use tf.sparse_placeholder. In Use coo_matrix in TensorFlow shows the way to feed data to a sparse_placeholder
Option 2: you need to convert sparse matrix to NumPy dense matrix and feed to tf.place_holder (of course, this way is impossible when the converted dense matrix is out of memory)