GPU not available with tensorflow ad Torch in Ubuntu 18.04 - tensorflow

I have a trouble with my GPU since it is not available in Tensorflow neither in Torch.
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
print('The device name is: ',device_name)
--> print : The device name is: '' # the device_name returned is empty
else:
print("GPU available")
Here is the version CUDA
$ nvcc --version
--> nvcc: NVIDIA (R) Cuda compiler driver
--> Copyright (c) 2005-2022 NVIDIA Corporation
--> Built on Tue_May__3_18:49:52_PDT_2022
--> Cuda compilation tools, release 11.7, V11.7.64
--> Build cuda_11.7.r11.7/compiler.31294372_0
The version of NVIDIA
nvidia-smi
--> NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
the version of tensorflow-gpu
import tensorflow as tf
print(tf.__version__)
--> 2.0.0
I read that CUDA version 11.7 can match with NVIDIA 515.65.01. Any idea please ?

Related

Stop Tensorflow trying to load cuda temporarily

I have this code to disable GPU usage:
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
w = tf.Variable(
[
[1.],
[2.]
])
I get this output still, not sure why :
E:\MyTFProject\venv\Scripts\python.exe E:/MyTFProject/tfvariable.py
2021-11-03 14:09:16.971644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-11-03 14:09:16.971644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-03 14:09:19.563793: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-03 14:09:19.566793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: newtonpc
2021-11-03 14:09:19.567793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: mypc
2021-11-03 14:09:19.567793: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
TF Version: '2.6.1'
Not able to stop it from loading Cuda DLLs. I dont want to setup cuda just right now. Maybe later.
I am using the latest PyCharm and installed tensorflow as given in the site with pip.
You can try to reinstall tensorflow with CPU-only version. The links are available here depending on your OS and your python version:
https://www.tensorflow.org/install/pip?hl=fr#windows_1

Getting CUDNN_STATUS_EXECUTION_FAILED error on Jetson Nano with OpenCV dnn module

I was following one of the online tutorials but I was getting this error:
Traceback (most recent call last):
File “ssd_object_detection.py”, line 20, in
detections = net.forward()
cv2.error: OpenCV(4.3.0) /home/blah/opencv/modules/dnn/src/layers/…/cuda4dnn/primitives/…/csl/cudnn/convolution.hpp:461: error: (-217:Gpu API call) CUDNN_STATUS_EXECUTION_FAILED in function ‘convolve_with_bias_activation’
It's a python script and I use Opencv dnn module with a pre-trained model
This is my configuration:
Jetson Nano device
Ubuntu 18.04
/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
– NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
– NVIDIA GPU arch: 53
– NVIDIA PTX archs:
– cuDNN: YES (ver 8.0)
– NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
– NVIDIA GPU arch: 53
– cuDNN: YES (ver 8.0)
opencv 4.3.0 built from source with OPENCV_DNN_CUDA=ON, CUDNN_VERSION=‘8.0’, WITH_CUDA=ON, WITH_CUDNN=ON, and many other settings enabled
Python 3.7.7
This is a snippet of the code I am trying to run (it completes successfully if I don’t use the GPU). It fails at the line detections = net.forward()
CLASSES = [“background”, “aeroplane”]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
print("[INFO] setting preferable backend and target to CUDA…")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
print("[INFO] accessing video stream…")
vs = cv2.VideoCapture(args[“input”] if args[“input”] else 0)
writer = None
fps = FPS().start()
while True:
(grabbed, frame) = vs.read()
frame = imutils.resize(frame, width=400)
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
....

Tensorflow 2.0 can't use GPU, something wrong in cuDNN? :Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

I am trying to understand and debug my code. I try to predict with a CNN model developed under tf2.0/tf.keras on GPU, but get those error messages.
could someone help me to fix it?
here is my environmental configuration
enviroments:
python 3.6.8
tensorflow-gpu 2.0.0-rc0
nvidia 418.x
CUDA 10.0
cuDNN 7.6+**
and the log file,
2019-09-28 13:10:59.833892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-09-28 13:11:00.228025: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-09-28 13:11:00.957534: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963310: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963416: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node mobilenetv2_1.00_192/Conv1/Conv2D}}]]
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0=====>GPU Available: True
=====> 4 Physical GPUs, 1 Logical GPUs
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "NSFW_Server.py", line 162, in <module>
model.predict(initial_tensor)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 915, in predict
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 722, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3625, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node mobilenetv2_1.00_192/Conv1/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_10727]
Function call stack:
keras_scratch_graph
The code
if __name__ == "__main__":
print("=====>GPU Available: ", tf.test.is_gpu_available())
tf.debugging.set_log_device_placement(True)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("=====>", len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
paras_path = "./paras/{}".format(int(2011))
model = tf.keras.experimental.load_from_saved_model(paras_path)
initial_tensor = np.zeros((1, INPUT_SHAPE, INPUT_SHAPE, 3))
model.predict(initial_tensor)
You have to check that you have the right version of CUDA + CUDNN + TensorFlow (also ensure that you have all installed).
A couple of examples of running configurations are presented below(UPDATE FOR LATEST VERSIONS OF TENSORFLOW)
Cuda 11.3.1 + CuDNN 8.2.1.32 + TensorFlow 2.7.0
Cuda 11.0 + CuDNN 8.0.4 + TensorFlow 2.4.0
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.2.0/TensorFlow 2.3.0 (TF >= 2.1 requires CUDA >=10.1)
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.1.0 (TF >= 2.1 requires CUDA >=
10.1)
Cuda 10.0 + CuDNN 7.6.3 + / TensorFlow 1.13/1.14 / TensorFlow 2.0.
Cuda 9.0 + CuDNN 7.0.5 + TensorFlow 1.10
Usually this error appears when you have an incompatible version of TensorFlow/CuDNN installed. In my case, this appeared when I tried using an older TensorFlow with a newer version of CuDNN.
**If for some reason you get an error message like(and nothing happens afterwards) :
Relying on the driver to perform ptx compilation
Solution : Install the latest nvidia driver
[SEEMS TO BE SOLVED IN TF >= 2.5.0] (see below):
Only for Windows Users : Some late combintations of CUDA, CUDNN and TF may not work, due to a bug (a .dll extension named improperly). To handle that specific case, please consult this link: Tensorflow GPU Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
For those who are facing issues regarding the above error(For Windows platform), I sorted it just by installing CuDNN version compatible with the CUDA already installed in the system.
This suitable version can be downloaded from the website Download CuDNN from Developer's portal. You might need Nvidia account for it. This will be easily created by providing mail id and filling a questionnaire.
To check the CUDA version, run NVCC --version.
Once the suitable version is downloaded, extract the folder from the zip file.
Go to the bin folder of the extracted folder. copy the cudnn64:7.dll and paste it in the CUDA's bin folder. In my case, the location where Cuda is installed is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin.
This would most probably solve the problem.
My system details:
Windows 10
CUDA 10.0
TensorFlow 2.0
GPU- Nvidia GTX 1060
I also found this blog Installing TensorFlow with CUDA and GPU support on Windows 10. very useful.
Check the instructions on this TensorFlow GPU instruction page for your OS. It resolved issue for me on Ubuntu 16.04.6 LTS and Tensorflow 2.0

How do I reinstate the GPU version of tensorflow that was running 2 days ago on my host?

Just two days ago, after much work on my part downloading and installing the latest stable GPU version of tensorflow, my tensorflow installation was behaving correctly as I wanted, and it reported this:
$ source activate tensorflowgpu
(tensorflowgpu) ga#ga-HP-Z820:~$ python
Python 3.5.4 |Anaconda, Inc.| (default, Nov 20 2017, 18:44:38)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant("Hello tensorflow")
>>> sess = tf.Session()
2018-01-22 12:37:32.119300: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-01-22 12:37:33.339324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797
pciBusID: 0000:41:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-01-22 12:37:33.339414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:41:00.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello tensorflow'
Sadly however, to my complete surprise today my tensorflow installation is misbehaving because it is reporting the following, that is, it's using the CPU version. What in the world happened to it, and how do I make it behave correctly again, that is, reinstate the GPU version of tensorflow? This is my privately owned workstation of which I am the sole user.
ga#ga-HP-Z820:~$ source activate tensorflowgpu
(tensorflowgpu) ga#ga-HP-Z820:~$ ipython
Python 3.5.4 |Anaconda custom (64-bit)| (default, Nov 20 2017, 18:44:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
...
In [2]: import tensorflow as tf
...: hello = tf.constant("Hello tensorflow")
...: sess = tf.Session()
...:
2018-01-24 12:34:56.792676: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-24 12:34:56.792719: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-24 12:34:56.792729: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Now looking closer, todauy my python (not ipython) does the following, which is good behavior again. That's strange -- the 2 pythons load different tensorflows. So how do I compel ipython and python to both use the GPU version of tensorflow? I really use ipython more, esp. for jupyter notebooks.
(tensorflowgpu) ga#ga-HP-Z820:~$ python
Python 3.5.4 |Anaconda, Inc.| (default, Nov 20 2017, 18:44:38)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant("Hello tensorflow")
>>> sess = tf.Session()
2018-01-24 12:50:35.846985: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-01-24 12:50:37.222662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797
pciBusID: 0000:41:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-01-24 12:50:37.222721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:41:00.0, compute capability: 6.1)
I had followed the installation instructions for tensorflow GPU on linux ubuntu at tensorflow website. It did not say to remove the CPU version first.
Edit -- As follows, I explicitly queried the tf versions. It confirms that the two different versions that are installed are being loaded by python vs ipython. I would want to cause both pythons to use only the newer version of tf, somehow.
(tensorflowgpu) ga#ga-HP-Z820:~$ python
Python 3.5.4 |Anaconda, Inc.| (default, Nov 20 2017, 18:44:38)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.4.1'
>>>
(tensorflowgpu) ga#ga-HP-Z820:~$ ipython
Python 3.5.4 |Anaconda custom (64-bit)| (default, Nov 20 2017, 18:44:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
In [2]: tf.__version__
Out[2]: '1.3.0'
You'll need to install the kernel in ipython, so that it knows about your environment. Currently, ipython is picking up the default python on your system, not the environment one. You can install the kernel by (make sure your environment is active)
pip install ipykernel
python -m ipykernel install --user --name tensorflowgpu
Now, just select this kernel when running ipython

tensorflow CUDA_ERROR_UNKNOWN error

I downloaded CUDA Toolkit 8.0 GA2, cuDNN v6.0 for CUDA 8 and tensorflow 1.4. I have an Nvidia 740M graphics chip. I tried running this code to test tensorflow:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
This is what it returns:
2017-11-09 16:25:42.999638: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compil
ed to use: AVX AVX2
2017-11-09 16:25:43.007712: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-11-09 16:25:43.010457: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: DESKTOP-06398IN
2017-11-09 16:25:43.010576: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_diagnostics.cc:165] hostname: DESKTOP-06398IN
b'Hello, TensorFlow!'
What does the error CUDA_ERROR_UNKNOWN mean? How do I fix it? Thanks
Updating the graphics driver fixed it.