How to use system GPU in Jupyter notebook? - tensorflow

I tried a lot of things before I could finally figure out this approach. There are a lot of videos and blogs asking to install the Cuda toolkit and cuDNN from the website. Checking the compatible version. But this is not required anymore all you have to do is the following
pip install tensorflow-gpu
pip install cuda
pip install cudnn
then use the following code to check if your GPU is active in the current notebook
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.config.list_physical_devices('GPU')
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
tf.test.is_built_with_cuda()
tf.debugging.set_log_device_placement(True)
I just want to confirm, if these steps are enough to enable GPU in jupyter notebook or am I missing something here?

If you installed the compatible versions of CUDA and cuDNN (relative to your GPU), Tensorflow should use that since you installed tensorflow-gpu. If you want to be sure, run a simple demo and check out the usage on the task manager.

Related

Using Object Detection API on local GPU but not last version (v2.5.0)

I am trying to use my local GPU to train an EfficientDetD0 model. I already have a good pipeline (that works on Google Colab for example), I modified it a bit to use it locally, but one problem happens every time I launch the training.
I use conda to install tensorflow-gpu with cuda and cudnn but it makes TensorFlow v2.4.1 environments and when I launch the training the Object Detection API automatically install TensorFlow V2.5.0. So my env is not using the gpu for the training because cuda and cudnn are waiting for TensorFlow to be v2.4.1 and not v2.5.0.
Is there a way to get the Object Detection API in v2.4.1 and not v2.5.0 ?
I tried many things but it doesn't work (training is failing or going for CPU training).
Here is the code that install dependencies and overwrite TensorFlow version to TensorFlow v2.5.0:
os.system("cp object_detection/packages/tf2/setup.py .")
os.system("python -m pip install .")
SYSTEM:
gpu : Nvidia RTX 3070
os : Ubuntu 20.04 LTS
tensorflow: 2.4.1
P.S.: I go with conda install -c conda-forge tensorflow-gpu for installing TensorFlow, cuda and cudnn in my training env because manually there was a dependency problem, so I took the easy way.
EDIT : solution found explained in comments.
Follow these steps to install specific version of tensorflow gpu
1. Set Up Anaconda Environments
conda create -n tf_gpu cudatoolkit=11.0
2. Activate the new Environment
source activate tf_gpu
3. Install tensorflow-gpu 2.4.1
pip install tensorflow==2.4.1
Try to run object_detection without "installing" it. Dont run setup.py. Just setup the neccesery paths and packages manually.
Or edit the setup.py to skip installing the specific verison of TF. I quess that this version is a requirement of some of the packages installed in setup.py.
I use the object_detection without running the setup.py or doing any "installation" without any problems.

Tensorflow after 1.15 - No need to install tensorflow-gpu package

Question
Please confirm that to use both CPU and GPU with TensorFlow after 1.15, install tensorflow package is enough and tensorflow-gpu is no more required.
Background
Still see articles stating to install tensorflow-gpu e.g. pip install tensorflow-gpu==2.2.0 and the PyPi repository for tensorflow-gpu package is active with the latest tensorflow-gpu 2.4.1.
The Annaconda document also refers to tensorflow-gpu package still.
Working with GPU packages - Available packages - TensorFlow
TensorFlow is a general machine learning library, but most popular for deep learning applications. There are three supported variants of the tensorflow package in Anaconda, one of which is the NVIDIA GPU version. This is selected by installing the meta-package tensorflow-gpu:
However, according to the TensorFlow v2.4.1 (as of Apr 2021) Core document GPU support - Older versions of TensorFlow
For releases 1.15 and older, CPU and GPU packages are separate:
pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU
According to the TensorFlow Core Guide Use a GPU.
TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required.
According to Difference between installation libraries of TensorFlow GPU vs CPU.
Just a quick (unnecessary?) note... from TensorFlow 2.0 onwards these are not separated, and you simply install tensorflow (as this includes GPU support if you have an appropriate card/CUDA installed).
Hence would like to have a definite confirmation that the tensorflow-gpu package would be for convenience (legacy script which has specified tensorflow-gpu, etc) only and no more required. There is no difference between tensorflow and tensorflow-gpu packages now.
It's reasonable to get confused here about the package naming. However, here is my understanding. For tf 1.15 or older, the CPU and GPU packages are separate:
pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU
So, if I want to work entirely on the CPU version of tf, I would go with the first command and otherwise, if I want to work entirely on the GPU version of tf, I would go with the second command.
Now, in tf 2.0 or above, we only need one command that will conveniently work on both hardware. So, in the CPU and GPU based system, we need the same command to install tf, and that is:
pip install tensorflow
Now, we can test it on a CPU based system ( no GPU)
import tensorflow as tf
print(tf.__version__)
print('1: ', tf.config.list_physical_devices('GPU'))
print('2: ', tf.test.is_built_with_cuda)
print('3: ', tf.test.gpu_device_name())
print('4: ', tf.config.get_visible_devices())
2.4.1
1: []
2: <function is_built_with_cuda at 0x7f2ce91415f0>
3:
4: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
or also test it on a CPU based system ( with GPU)
2.4.1
1: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2: <function is_built_with_cuda at 0x7fb6affd0560>
3: /device:GPU:0
4: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
So, as you can see this is just a single command for both CPU and GPU cases. Hope it's clear now more. But until now (in tf > = 2) we can also use -gpu / -cpu postfix while installing tf that delicately use for GPU / CPU respectively.
!pip install tensorflow-gpu
....
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-2.4.1
# -------------------------------------------------------------
!pip install tensorflow-cpu
....
Installing collected packages: tensorflow-cpu
Successfully installed tensorflow-cpu-2.4.1
Check: Similar response from tf-team.

Tensorflow GPU is not identifying my GPUs

I have been trying to use Tensorflow GPU, but apparently, Tersorflow is not identifying my GPUs.
When I run:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
As an output, only my CPU shows up. I have checked all of the versions of everything and they seem to be compatible. I have CUDA 10.1 with CUDA Toolkit, cuDNN 7.5 and Tensorflow 1.13.1. I am running everything on Ubuntu 18.xx
What am I doing wrong?
what is the output of:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
On my system, tensorflow is not recognizing GPU because it is a XLA_GPU. I'm not really sure why a XLA_GPU is not also a GPU, seems there is a OR statement missing somewhere in the tensorflow-gpu code.
If above code does not list any GPUs (and you have one):
pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu
… worked for me.

xgboost install on tensorflow GPU support

I already install tensorflow GPU support.
try install xgboost on tensorflow by
'conda install -c anaconda py-xgboost'
I wonder the xgboost what GPU support or not.
I don't install https://xgboost.readthedocs.io/en/latest/build.html#building-with-gpu-support
only tensorflow GPU support.
Do i need install xgboost Gpu support or not??? if i want use xgboost with GPU support
You can check if your xgboost is compiled for gpu, just try to run some model with tree_method='gpu_hist' or another gpu method (here).
If it would raise an error that xgboost's not compiled for gpu, then reinstall it following the instructions that you have found.
Probably, you don't need install CUDA (if you have successfully installed tensorflow-gpu and it works, then CUDA must be installed already), but you definitely should build gpu-supported xgboost.

tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Note : this question was initially asked on github, but it was asked to be here instead
I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.
The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).
What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).
I have two graphical cards:
an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
a GTX 1060 that I want to dedicate to tensorflow
I use:
tensorflow-1.0.0
cuda-8.0 (ls -l /usr/local/cuda/lib64/libcud*)
cudnn-5.1.10
python-2.7.12
nvidia-drivers-375.26 (this was installed by cuda and replaced my distro driver package)
I've tried:
adding /usr/local/cuda/bin/ to $PATH
forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
running the script with sudo (because why not)
Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.
At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!
Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb
Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?
From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.
If you run the following commands, you should be able to use the GPU in subsequent runs:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:
Step 0: Uninstall protobuf
pip uninstall protobuf
Step 1: Uninstall tensorflow
pip uninstall tensorflow
pip uninstall tensorflow-gpu
Step 2: Force reinstall Tensorflow with GPU support
pip install --upgrade --force-reinstall tensorflow-gpu
Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES
So for me with 2 GPUs it would be
export CUDA_VISIBLE_DEVICES=0,1
In my case:
pip3 uninstall tensorflow
is not enough. Because when reinstall with:
pip3 install tensorflow-gpu
It is still reinstall tensorflow with cpu not gpu.
So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!
For conclusion:
pip3 uninstall tensorflow
Remove all tensor folders in ~\Python35\Lib\site-packages
pip3 uninstall protobuf
pip3 install tensorflow-gpu
Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.
The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and
source ~/.bashrc if you are on Linux:
export CUDA_VISIBLE_DEVICES=0,1
Previously I had to use this workaround to get tensorflow recognize my GPU:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)
Even though the code still worked, adding these lines every time is clearly not something I would want.
My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.
If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.
This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA
In my case, the error was like this:
Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.
There are various system incompatible problems.
The requirement for libraries can vary from the version of TensorFlow.
During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:
python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"
After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:
https://www.tensorflow.org/install/gpu#software_requirements
https://www.tensorflow.org/install/gpu#hardware_requirements
p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.
For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:
conda install -c anaconda tensorflow-gpu
and then tensorflow could find the GPU correctly.