TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first (fastai) - numpy

I am following the code here:
https://www.kaggle.com/tanlikesmath/diabetic-retinopathy-with-resnet50-oversampling
However, during the metrics calculation, I am getting the following error:
File "main.py", line 50, in <module>
learn.fit_one_cycle(4,max_lr = 2e-3)
...
File "main.py", line 39, in quadratic_kappa
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat,1), y, weights='quadratic'),device='cuda:0')
...
File "/pfs/work7/workspace/scratch/ul_dco32-conda-0/conda/envs/resnet50/lib/python3.8/site-packages/torch/tensor.py", line 486, in __array__
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Here are the metrics and the model:
def quadratic_kappa(y_hat, y):
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat,1), y, weights='quadratic'),device='cuda:0')
learn = cnn_learner(data, models.resnet50, metrics = [accuracy,quadratic_kappa])
learn.fit_one_cycle(4,max_lr = 2e-3)
As it is being said in the discussion https://discuss.pytorch.org/t/typeerror-can-t-convert-cuda-tensor-to-numpy-use-tensor-cpu-to-copy-the-tensor-to-host-memory-first/32850/6, I have to bring the data back to cpu. But I am slightly lost how to do it.
I tried to add .cpu() all over the metrics but could not solve it so far.

I'm assuming that both y and y_hat are CUDA tensors, that means that you need to bring them both to the CPU for the cohen_kappa_score, not just one.
def quadratic_kappa(y_hat, y):
return torch.tensor(cohen_kappa_score(torch.argmax(y_hat.cpu(),1), y.cpu(), weights='quadratic'),device='cuda:0')
# ^^^ ^^^
Calling .cpu() on a tensor that is already on the CPU has no effect, so it's safe to use in any case.

I went from a CPU to a GPU version and received this error. It was due to passing metrics=[mean_absolute_error,mean_squared_error] to the Learner object (in my case tabular_learner).
Removing the metric parameter solved the issue temporarily for me.

Related

TypeError: 'TensorShape' object is not callable

I am new to Tensorflow programming , i was digging up some functions and got this error in the snippet :
**with** **tf.Session()** as sess_1:
c = tf.constant(5)
d = tf.constant(6)
e = c + d
print(sess_1.run(e))
print(sess_1.run(e.shape()))
Error found :Traceback (most recent call last):
File "C:/Users/Ashu/PycharmProjects/untitled/Bored.py", line 15, in
print(sess_1.run(e.shape()))
TypeError: 'TensorShape' object is not callable
I didn't found it here so can anyone please clarify this silly doubt as i am new learner.Sorry for any typing mistake !
I have a one more doubt , when i uses simply eval() function it doesn't print anything in pycharm , i had to use it along with print() method. So my doubt is when print() method is used it doesn't print the dtype of the tensor , it simply print the tensor or python object value in pycharm.(Why i am not getting the output in the format like : array([1. , 1.,] , dtype=float32))Is it the Pycharm way to print the tensor in new version or is it something i am doing wrong ? So excited to know the thing behind this , please help and pardon if i am wrong at any place.
One confusing aspect of tensorflow for beginners is there are two types of shape: dynamic shape, given by tf.shape(x), and static shape, given by x.shape (assuming x is a tensor). While they represent the same concept, they are used very differently.
Static shape is the shape of a tensor known at run time. Its a data type in its own right, but it can be converted to a list using as_list().
x = tf.placeholder(shape=(None, 3, 4))
static_shape = x.shape
shape_list = x.shape.as_list()
print(shape_list) # [None, 3, 4]
y = tf.reduce_sum(x, axis=1)
print(y.shape.as_list()) # [None, 4]
During operations, tensorflow tracks static shapes as best it can. In the above example, y's shape was calculated based on the partially known shape of x's. Note we haven't even created a session, but the static shape is still known.
Since the batch size is not known, you can't use the static first entry in calculations.
z = tf.reduce_sum(x) / tf.cast(x.shape.as_list()[0], tf.float32) # ERROR
(we could have divided by x.shape.as_list()[1], since that dimension is known at run-time - but that wouldn't demonstrate anything here)
If we need to use a value which is not known statically - i.e. at graph construction time - we can use the dynamic shape of x. The dynamic shape is a tensor - like other tensors in tensorflow - which is evaluated using a session.
z = tf.reduce_sum(x) / tf.cast(tf.shape(x)[0], tf.float32) # all good!
You can't call as_list on the dynamic shape, nor can you inspect its values without going through a session evaluation.
As stated in the documentation, you can only call a session's run method with tensors, operations, or lists of tensors/operations. Your last line of code calls run with the result of e.shape(), which has type TensorShape. The session can't execute a TensorShape argument, so you're getting an error.
When you call print with a tensor, the system prints the tensor's content. If you want to print the tensor's type, use code like print(type(tensor)).

Why does the weight matrix of the mxnet.gluon.nn.Dense object has no shape?

I try to follow this nice MXNet Tutorial. I create an extremely simple neural network (two input unit, no hidden units and one output unit) doing this:
from mxnet import gluon
net = gluon.nn.Dense(1, in_units=2)
After that I try to take a look at the shape of the weight matrix (the same way as it is described in the tutorial):
print(net.weight)
As a result I expect to see this:
Parameter dense4_weight (shape=(1, 2), dtype=None)
However, I see the following error message:
Traceback (most recent call last):
File "tmp.py", line 5, in <module>
print(net.weight)
File "/usr/local/lib/python3.6/site-packages/mxnet/gluon/parameter.py", line 120, in __repr__
return s.format(**self.__dict__)
KeyError: 'shape'
Am I doing something wrong?
This is a regression that happened here and has since been fixed on master branch here. Expect it to be fixed in the next MXNet release.

Declaring theano variables for pymc3

I am having issues replicating a pymc2 code using pymc3.
I believe it is due to the fact pymc3 is using the theano type variables which are not compatible with the numpy operations I am using. So I am using the #theano.decorator:
I have this function:
with pymc3.Model() as model:
z_stars = pymc3.Uniform('z_star', self.z_min_ssp_limit, self.z_max_ssp_limit)
Av_stars = pymc3.Uniform('Av_star', 0.0, 5.00)
sigma_stars = pymc3.Uniform('sigma_star',0.0, 5.0)
#Fit observational wavelength
ssp_fit_output = self.ssp_fit_theano(z_stars, Av_stars, sigma_stars,
self.obj_data['obs_wave_resam'],
self.obj_data['obs_flux_norm_masked'],
self.obj_data['basesWave_resam'],
self.obj_data['bases_flux_norm'],
self.obj_data['int_mask'],
self.obj_data['normFlux_obs'])
#Define likelihood
like = pymc.Normal('ChiSq', mu=ssp_fit_output,
sd=self.obj_data['obs_fluxEr_norm'],
observed=self.obj_data['obs_fluxEr_norm'])
#Run the sampler
trace = pymc3.sample(iterations, step=step, start=start_conditions, trace=db)
where:
#theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dscalar,t.dvector,
t.dvector,t.dvector,t.dvector,t.dvector,t.dscalar],
otypes=[t.dvector])
def ssp_fit_theano(self, input_z, input_sigma, input_Av, obs_wave, obs_flux_masked,
rest_wave, bases_flux, int_mask, obsFlux_mean):
...
...
The first three variables are scalars (from the pymc3 uniform distribution). The
remaining variables are numpy arrays and the last one is a float. However, I am
getting this "'numpy.ndarray' object has no attribute 'type'" error:
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 615, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in make_node
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in <genexpr>
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
AttributeError: 'numpy.ndarray' object has no attribute 'type'
Please any advice in the right direction will be most welcomed.
I had a bunch of time-wasting-stops when I went from pymc2 to pymc3. The problem, I think, is that the doc is quite bad. I suspect they neglect the doc as far as the code is still evolving. 3 comments/advises:
I wish you could find some help using '#theano.compile.ops.as_op' here: failure to adapt pymc2 into pymc3 or here how to fit a method belonging to an instance with pymc3?
The drawback of '#theano.compile.ops.as_op' is that you implicitly exclude any analysis related to the gradient of your function. To have access to the gradient, I think you need to define your function in a more complex way presented here how to fit a method belonging to an instance with pymc3?
warning: for the moment, using theano seems to be a source of problem if you want to distribute your code under Windows. See build a .exe for Windows from a python 3 script importing theano with pyinstaller, but I am not sure whether it is just a personal clumsiness or really a problem. Personally I had to give up theano to be able to distribute my code...

How to convert Pytorch autograd.Variable to Numpy?

The title says it all. I want to convert a PyTorch autograd.Variable to its equivalent numpy array. In their official documentation they advocated using a.numpy() to get the equivalent numpy array (for PyTorch tensor). But this gives me the following error:
Traceback (most recent call last): File "stdin", line 1, in module
File
"/home/bishwajit/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py",
line 63, in getattr raise AttributeError(name) AttributeError:
numpy
Is there any way I can circumvent this?
Two possible case
Using GPU: If you try to convert a cuda float-tensor directly to numpy like shown below,it will throw an error.
x.data.numpy()
RuntimeError: numpy conversion for FloatTensor is not supported
So, you cant covert a cuda float-tensor directly to numpy, instead you have to convert it into a cpu float-tensor first, and try converting into numpy, like shown below.
x.data.cpu().numpy()
Using CPU: Converting a CPU tensor is straight forward.
x.data.numpy()
I have found the way. Actually, I can first extract the Tensor data from the autograd.Variable by using a.data. Then the rest part is really simple. I just use a.data.numpy() to get the equivalent numpy array. Here's the steps:
a = a.data # a is now torch.Tensor
a = a.numpy() # a is now numpy array

Tensorflow : Android demo accuracy

What related GitHub issues or Stack Overflow threads have you found by searching the web for your problem?
I searched #1269 #504
Environment info
Mac OS for build and Android version 5 to run .apk demo.
If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)
I followed the steps mentioned in #1269 and could able to run the example successfully, but the accuracy of the result is very low and often wrong. I have trained my systems on 25 different daily used products like soap, soup, noodles, etc.
Where as when i run the same example using following script it give me very high accuracy (approx. 90-95%)
import sys
import tensorflow as tf
// change this as you see fit
image_path = sys.argv[1]
// Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
// Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("/tf_files/retrained_labels.txt")]
// Unpersists graph from file
with tf.gfile.FastGFile("/tf_files/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
// Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
// Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
The only difference I see here is that the model file used in the Android demo is stripped because it does not support DecodeJpeg, whereas in the above code its the actually generated unstripped model. Is there any specific reason or somewhere I am wrong here?
I also tried using optimize_for_inference
but unfortunately, it fails with following error:
[milinddeore#P028: ~/tf/tensorflow ] bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/Users/milinddeore/tf_files_nm/retrained_graph.pb --output=/Users/milinddeore/tf/tensorflow/tensorflow/examples/android/assets/tf_ul_stripped_graph.pb --input_names=DecodeJpeg/content —-output_names=final_result
Traceback (most recent call last):
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference.py", line 141, in <module>
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference.py", line 90, in main
FLAGS.output_names.split(","), FLAGS.placeholder_type_enum)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/optimize_for_inference_lib.py", line 91, in optimize_for_inference
placeholder_type_enum)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/tools/strip_unused_lib.py", line 71, in strip_unused
output_node_names)
File "/Users/milinddeore/tf/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference.runfiles/org_tensorflow/tensorflow/python/framework/graph_util_impl.py", line 141, in extract_sub_graph
assert d in name_to_node_map, "%s is not in graph" % d
AssertionError: is not in graph
I suspect that this problem is due to the android not being parse DecodeJpeg, but please correct me if i am wrong.
What other attempted solutions have you tried?
Yes, I the above script and it gives me quite high accuracy result.
Well, the reason for bad accuracy is following:
I ran this example code on Lenovo Vibe K5 mobile (This has SanpDragon 415), this wasn't compiled for hexagon DSP, even through the DSP on 415 is very old as compared to 835 (Hexagon DSP 682), in fact i am not quite sure if the Hexagon SDK will work with 415 or not, i haven't tried this though. This means the example was running on CPU to first detect motion and later classify them and hence the poor performance.
Slow FPS, will capture the images very slowly and hence moving objects will be really difficult.
So if you have bad image, there are very strong chances that the prediction will also be bad.
Camera capture and classification is taking long time, due to latency its not quite real-time.