I am facing issue while loading the model using torch which was trained using GPU, I am trying to load that model using CPU. however I am successfully able to load the model but while predicting the results I am getting error. However if I use GPU machine I am able to predict the output but not on the CPU:
My code:
****To save the model I am using :****
PATH = "model.pt"
torch.save(model, PATH)
**To Load the Model**
import torch
PATH = "model.pt"
device = torch.device('cpu')
loaded_model=torch.load(PATH, map_location=device)
I am able to successfully load the model. but while predicting I am getting runtime error
**Predicting the loaded model using CPU**
predicted_title = loaded_model.predict([abstract])
Runtime Error: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver
I am sorry if the error might turn out very simple but I am not able to rectify this.
You can output the models device with
print(loaded_model.device)
if it is not cpu, do
model = model.to('cpu')
Related
I'm trying to run a converted RepNet model in TFLite on mobile (iOS and Android) using Flutter, and the tf_lite_flutter package.
I have successfully converted the model to TFLite by adapting the colab provided by the authors. The notebook I used can be found in this repository, along with the converted model.
To make sure everything was working before attempting to run on an edge device, I checked everything with the Python TFLite API (this notebook). Everything indeed worked well - the output for the TFLite model matches the output of the Google colab provided by the authors.
I created a Flutter project to get the model running on mobile. I've tried passing in the default input and output, resulting from calls to interpreter.getInputTensors() and interpreter.getOutputTensors() respectively. When using this project to try to run the model, I encounter the following error:
E/tflite (26540): tensorflow/lite/kernels/reshape.cc:69 num_input_elements != num_output_elements (1 != 0)
E/tflite (26540): Node number 1 (RESHAPE) failed to prepare.
I'm admittedly pretty new to Tensorflow and Tensorflow Lite, so my debugging ability is somewhat limited. It does seem strange to me that the expected output shape is 0. Considering it is working with the Python API, I'm not sure why it isn't working on-device. The only thing I might suspect it could be is the batch_size not being configured properly. Using the shape_signature field, as in interpreter.get_input_details()[0]['shape_signature'], I can see that the batch size is dynamic (value -1).
The model was converted using Tensorflow==2.5 in Python, and is being run using the standard TFLite 2.5 binaries (no GPUDelegate).
Any suggestions for fixing this error would be appreciated!
When I run an experiment with H2O AutoML, I got the error: "terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: invalid resource handle". This error message comes from XGBoost and it is because of the GPU limit exceed.
While I'm using the regular XGBoost, I set the cuda visible devices parameter to blank to disable GPUs. However, this arguments seems to be ignored in H2O AutoML - XGBoost implementation.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Currently, the only xgboost can be run on gpu in H2O AutoML.
The question it that anybody knows how to disable GPUs in H2O AutoML?
As a workaround, I excluded XGBoost algorithm to run my experiment for now. The trouble is passed when I exclude XGBoost but I do not want to give up the power of XGBoost.
from h2o.automl import H2OAutoML
model = H2OAutoML(max_runtime_secs = 60*60*2, exclude_algos = ["XGBoost"])
That's definitely an oversight and we will need to add the ability to turn on/off and/or specify the GPU. I opened a ticket for this. I wonder if there's a way to temporarily disable the GPU at the system level (outside of H2O/Python) in the meantime? Thanks for the report!
Your session crashed for an unknown reason
when I run the following cell in Google Colab:
from keras import backend as K
if 'tensorflow' == K.backend():
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))
I receive this message since I have uploaded two data sets to google drive.
Does anyone know this message and can give me some advice?
Many thanks for every hint.
Update:
I always receive the message
Update
I have removed the data sets from Google Drive, but the session is still crashing.
Google Colab is crashing because you are trying to Run Code related to GPU with Runtime as CPU.
The execution is successful if you change the Runtime as GPU. Steps for the same are mentioned below:
Runtime -> Change Runtime -> GPU (Select from dropdown).
Please find the Working code in Github Gist.
Just a side note: sometimes you may want to reinstall an litle older version of the related module (see from the error log). It works for me in a case.
This error happens when the expected device and the actual device are different.
For example, if you run the code that is written with torch_xla, which is for TPU training, on the GPU (cuda) then the Colab will return you this error.
It is really tricky since it does not give you an actual debugging message, etc, which makes you hard to find what is the actual problem.
I am running sample program which comes packaged with Tensorflow object detection API(object_detection_tutorial.ipynb).
Program runs fine with no error, but bounding boxes are not diaplayed at all.
My environment is as follows:
Windows 10
Python 3.6.3
What can be the reason?
With regards
Manish
It seems that the latest version of the model ssd_mobilenet_v1_coco_2017_11_08 doesn't work and outputs abnormally low value. Replacing it in the Jupyter Notebook with an older version of the model works fine for me:
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
Ref: https://github.com/tensorflow/models/issues/2773
Please try updated SSD models in the detection zoo : https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md. This should be fixed.
I saved model trained in Keras as .h5 file.
When trying to load it in distributed setting, I got the error:
InvalidArgumentError (see above for traceback): Cannot assign a device
to node 'policy/dense_2_b': Could not satisfy explicit device
specification '/job:ps/task:0' because no devices matching that
specification are registered in this process; available devices:
/job:localhost/replica:0/task:0/cpu:0
It seems that somehow variables in .h5 file are assigned to specific device with job "localhost". So when I am trying to load it on parameter server, I get the error.
Could anyone clarify how to address this? I probably should load keras model first without starting servers and then to reload it on parameter server. But details are unclear for me..