Tensorflow Loss function is NAN when using GPU - tensorflow

I am trying to train custom object detection model using pre-trained model from Tensorflow1 Model ZOO.
I am using model ssd_mobilenet_v2_coco_2018_03_29
I created suitable environment for training following this tutorial :https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/tensorflow-1.14/training.html
The thing is, when I tried to train the model using tensorflow-gpu==1.14.0 I always got the error saying Model diverged with loss = NaN.
Then I tried to uninstall tensorflow-gpu==1.14.0 and install tensorflow==1.14.0 (so it did not use my GPU) and all of sudden it started to work !
I have no idea how is that possible...
Command I am using -
python model_main.py --alsologtostderr --model_dir=models\ssd_mobilenet_v2_coco_2018_03_29\export --pipeline_config_path=models\ssd_mobilenet_v2_coco_2018_03_29\pipeline.config --num_train_steps=2000
Python version is 3.7
OS is Windows 10
My Graphics Card is Nvidia GeForce RTX3050, I used CUDA v10.0 and cuDNN v7.4.1
Any ideas ?

This is because RTX30's don't support cuda 10. If you need tf v1 (1.15) you can install nvidia's tensorflow (1.15) that can run on cuda 11.
pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]
Note: Only supports Python 3.6 or 3.8 [Not 3.7]
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

Related

Tensorflow after 1.15 - No need to install tensorflow-gpu package

Question
Please confirm that to use both CPU and GPU with TensorFlow after 1.15, install tensorflow package is enough and tensorflow-gpu is no more required.
Background
Still see articles stating to install tensorflow-gpu e.g. pip install tensorflow-gpu==2.2.0 and the PyPi repository for tensorflow-gpu package is active with the latest tensorflow-gpu 2.4.1.
The Annaconda document also refers to tensorflow-gpu package still.
Working with GPU packages - Available packages - TensorFlow
TensorFlow is a general machine learning library, but most popular for deep learning applications. There are three supported variants of the tensorflow package in Anaconda, one of which is the NVIDIA GPU version. This is selected by installing the meta-package tensorflow-gpu:
However, according to the TensorFlow v2.4.1 (as of Apr 2021) Core document GPU support - Older versions of TensorFlow
For releases 1.15 and older, CPU and GPU packages are separate:
pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU
According to the TensorFlow Core Guide Use a GPU.
TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required.
According to Difference between installation libraries of TensorFlow GPU vs CPU.
Just a quick (unnecessary?) note... from TensorFlow 2.0 onwards these are not separated, and you simply install tensorflow (as this includes GPU support if you have an appropriate card/CUDA installed).
Hence would like to have a definite confirmation that the tensorflow-gpu package would be for convenience (legacy script which has specified tensorflow-gpu, etc) only and no more required. There is no difference between tensorflow and tensorflow-gpu packages now.
It's reasonable to get confused here about the package naming. However, here is my understanding. For tf 1.15 or older, the CPU and GPU packages are separate:
pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU
So, if I want to work entirely on the CPU version of tf, I would go with the first command and otherwise, if I want to work entirely on the GPU version of tf, I would go with the second command.
Now, in tf 2.0 or above, we only need one command that will conveniently work on both hardware. So, in the CPU and GPU based system, we need the same command to install tf, and that is:
pip install tensorflow
Now, we can test it on a CPU based system ( no GPU)
import tensorflow as tf
print(tf.__version__)
print('1: ', tf.config.list_physical_devices('GPU'))
print('2: ', tf.test.is_built_with_cuda)
print('3: ', tf.test.gpu_device_name())
print('4: ', tf.config.get_visible_devices())
2.4.1
1: []
2: <function is_built_with_cuda at 0x7f2ce91415f0>
3:
4: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
or also test it on a CPU based system ( with GPU)
2.4.1
1: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2: <function is_built_with_cuda at 0x7fb6affd0560>
3: /device:GPU:0
4: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
So, as you can see this is just a single command for both CPU and GPU cases. Hope it's clear now more. But until now (in tf > = 2) we can also use -gpu / -cpu postfix while installing tf that delicately use for GPU / CPU respectively.
!pip install tensorflow-gpu
....
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-2.4.1
# -------------------------------------------------------------
!pip install tensorflow-cpu
....
Installing collected packages: tensorflow-cpu
Successfully installed tensorflow-cpu-2.4.1
Check: Similar response from tf-team.

AttributeError: module 'tensorflow._api.v1.config' has no attribute 'set_visible_devices'

First of all, I apologize that my English is not good for you to understand.
Currently, I am doing computer vision using tensorflow version 1.14. In the process, the following problem ocurred in the process of rotating the model using GPU.
AttributeError: module 'tensorflow._api.v1.config' has no attribute 'set_visible_devices'
The current development environment is as follows.
Python: 3.7.9
conda: 4.8.3
tensorflow: 1.14.0
keras: 2.3.1
In addition, I currently have 4 gpu, and i want to use 2 gpu as if it were 1 gpu. Can you give me a good idea for this?
thank you.
It seems you need to upgrade the tensorflow because tf.config.set_visible_devices() function is available in latest version of tensorflow, you can use below code to upgrade the tensorflow:
!pip install --upgrade pip.
!pip install --upgrade tensorflow
You can follow the link to install the CPU/GPU version of tensorflow as per requirement and for tf.config.set_visible_devices() function related details, check here

Google Colab Tensorflow 1.15 GPU

Does anyone know if Google Colab's GPUs are only compatible with tensorflow versions 2.x? I'm trying to run tensorflow 1 code, so I am pip installing tensorflow 1.15, also pip installing tensorflow 1.15 gpu, and changing my notebook settings to enable GPU, however I don't seem to see the GPU speed up effects?

xgboost install on tensorflow GPU support

I already install tensorflow GPU support.
try install xgboost on tensorflow by
'conda install -c anaconda py-xgboost'
I wonder the xgboost what GPU support or not.
I don't install https://xgboost.readthedocs.io/en/latest/build.html#building-with-gpu-support
only tensorflow GPU support.
Do i need install xgboost Gpu support or not??? if i want use xgboost with GPU support
You can check if your xgboost is compiled for gpu, just try to run some model with tree_method='gpu_hist' or another gpu method (here).
If it would raise an error that xgboost's not compiled for gpu, then reinstall it following the instructions that you have found.
Probably, you don't need install CUDA (if you have successfully installed tensorflow-gpu and it works, then CUDA must be installed already), but you definitely should build gpu-supported xgboost.

Can I implement deep learning models in my laptop with intel hd graphics

I am currently doing a project on deep learning for my masters degree. I wanted to install keras library, so when I started installing Theano and tensorflow i saw that i have to install CUDA. But my laptop comes with intel hd graphics. So my question is will it work if i install them anyway.
Thanks
Edit:
As of now, you can directly use OpenCL based clDNN (https://github.com/01org/clDNN) instead of using OpenVX, in order to perform Deep Learning inference on Intel Graphics. You will have to do the training on a powerful GPU like Nvidia or AMD and use the pre-trained model and use it in clDNN.
You can start using Intel's Computer Vision SDK (https://software.intel.com/en-us/computer-vision-sdk) in order to write Deep Learning Applications using OpenCV or OpenVX.
OpenVX (https://www.khronos.org/openvx/) programming model allows you to write simple Neural Network pipelines using the following SPEC (https://www.khronos.org/registry/OpenVX/extensions/neural_network/html/)
Alternatively you can use Model Optimizer that converts Caffe/TensorFlow model into OpenVX, and you can accelerate the OpenVX Neural Network graph on Intel Integrated HD Graphics.
Hope it helps.
You can install and use keras without cuda, but you can't get gpu accelerating with intel hd graphics.
If you use Theano as keras' backend, first install Theano:
# for python2
pip install theano
# for python3
pip3 install theano
Then set ~/.theanorc file like this:
[global]
floatX = float32
device = cpu
allow_gc = True
[blas]
ldflags = -lopenblas
If you use TensorFlow as keras' backend, just install the CPU version of TensorFlow.
# for python2.7
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0-cp27-none-linux_x86_64.whl
# for python3.4
pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0-cp34-cp34m-linux_x86_64.whl
# for python3.5
pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0-cp35-cp35m-linux_x86_64.whl
Then install keras:
# for python2
pip install keras
# for python3
pip3 install keras
Their is a PlaidML with that you train deep learning model on Intel and AMD gpu.