ptxas is older than 11.1 while cuda is 11.2. WHAT'S WRONG? - tensorflow

I am not familiar with CUDA but I have to try TensorFlow and want to accelerate it with GPU.
I install cuda 11.2 according to TF installation guide.
And I run tf.keras.Model.__call__() and got a warning below repeatedly:
You may not need to update to CUDA 11.1; cherry-picking the ptxas binary is often sufficient.
2023-02-11 18:25:17.471905: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:115] *** WARNING *** You are using ptxas 9.1.108, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.
It says you have to install CUDA 11.1.
Do you have any idea what should I do for updating ptxas?
My environment is written in the tail.
Thank you
Environment
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ dpkg -l | grep -i cuda
ii cuda 11.2.2-1 amd64 CUDA meta-package
ii cuda-11-2 11.2.2-1 amd64 CUDA 11.2 meta-package
ii cuda-command-line-tools-11-2 11.2.2-1 amd64 CUDA command-line tools
ii cuda-compiler-11-2 11.2.2-1 amd64 CUDA compiler
ii cuda-cudart-11-2 11.2.152-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-2 11.2.152-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-2 11.2.152-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-2 11.2.152-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-2 11.2.152-1 amd64 CUDA profiling tools interface.
ii cuda-cuxxfilt-11-2 11.2.152-1 amd64 CUDA cuxxfilt
ii cuda-demo-suite-11-2 11.2.152-1 amd64 Demo suite for CUDA
ii cuda-documentation-11-2 11.2.154-1 amd64 CUDA documentation
ii cuda-driver-dev-11-2 11.2.152-1 amd64 CUDA Driver native dev stub library
ii cuda-drivers 525.85.12-1 amd64 CUDA Driver meta-package, branch-agnostic
ii cuda-drivers-525 525.85.12-1 amd64 CUDA Driver meta-package, branch-specific
ii cuda-gdb-11-2 11.2.152-1 amd64 CUDA-GDB
ii cuda-libraries-11-2 11.2.2-1 amd64 CUDA Libraries 11.2 meta-package
ii cuda-libraries-dev-11-2 11.2.2-1 amd64 CUDA Libraries 11.2 development meta-package
ii cuda-memcheck-11-2 11.2.152-1 amd64 CUDA-MEMCHECK
ii cuda-nsight-11-2 11.2.152-1 amd64 CUDA nsight
ii cuda-nsight-compute-11-2 11.2.2-1 amd64 NVIDIA Nsight Compute
ii cuda-nsight-systems-11-2 11.2.2-1 amd64 NVIDIA Nsight Systems
ii cuda-nvcc-11-2 11.2.152-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-2 11.2.152-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-2 11.2.152-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-2 11.2.152-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-2 11.2.152-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-2 11.2.152-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-2 11.2.152-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-2 11.2.152-1 amd64 NVIDIA Tools Extension
ii cuda-nvvp-11-2 11.2.152-1 amd64 CUDA Profiler tools
ii cuda-runtime-11-2 11.2.2-1 amd64 CUDA Runtime 11.2 meta-package
ii cuda-samples-11-2 11.2.152-1 amd64 CUDA example applications
ii cuda-sanitizer-11-2 11.2.152-1 amd64 CUDA Sanitizer
ii cuda-toolkit-11-2 11.2.2-1 amd64 CUDA Toolkit 11.2 meta-package
ii cuda-tools-11-2 11.2.2-1 amd64 CUDA Tools meta-package
ii cuda-visual-tools-11-2 11.2.2-1 amd64 CUDA visual tools
ii libcudart9.1:amd64 9.1.85-3ubuntu1 amd64 NVIDIA CUDA Runtime Library
ii libcudnn8 8.1.1.33-1+cuda11.2 amd64 cuDNN runtime libraries
ii libcudnn8-dev 8.1.1.33-1+cuda11.2 amd64 cuDNN development libraries and headers
rc libcufile-12-0 1.5.1.14-1 amd64 Library for GPU Direct Storage with CUDA 12.0
ii libcusolver-11-2 11.1.0.152-1 amd64 CUDA solver native runtime libraries
ii libcusolver-dev-11-2 11.1.0.152-1 amd64 CUDA solver native dev links, headers
ii libnvrtc9.1:amd64 9.1.85-3ubuntu1 amd64 CUDA Runtime Compilation (NVIDIA NVRTC Library)
ii nvidia-cuda-dev 9.1.85-3ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-doc 9.1.85-3ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-cuda-gdb 9.1.85-3ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 9.1.85-3ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-profiler 9.1.85-3ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-visual-profiler 9.1.85-3ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
$ python3 -m pip list | grep tensorflow
tensorflow 2.11.0
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.30.0

Related

How to find the JetPack version of NVIDIA Jetson Device?

Is there a way to find the version of the currently installed JetPack on my NVIDIA Jetson Xavier AGX kit?
To get the JetPack version, architecture and dependencies,
sudo apt-cache show nvidia-jetpack
#Package: nvidia-jetpack
#Version: 4.4.1-b50
#Architecture: arm64
#Maintainer: NVIDIA Corporation
#Installed-Size: 194
#Depends: nvidia-cuda (= 4.4.1-b50), nvidia-opencv (= 4.4.1-b50), nvidia-cudnn8 (= 4.4.1-b50)
For the version specifically,
sudo apt-cache show nvidia-jetpack | grep "Version"
#Version: 4.4.1-b50
git clone https://github.com/jetsonhacks/jetsonUtilities.git
cd jetsonUtilities
python jetsonInfo.py
Output:
NVIDIA Jetson Nano (Developer Kit Version)
L4T 32.5.1 [ JetPack 4.5.1 ]
Ubuntu 18.04.5 LTS
Kernel Version: 4.9.201-tegra
CUDA 10.2.89
CUDA Architecture: 5.3
OpenCV version: 3.4.17-dev
OpenCV Cuda: YES
CUDNN: 8.0.0.180
TensorRT: 7.1.3.0
Vision Works: 1.6.0.501
VPI: ii libnvvpi1 1.0.15 arm64 NVIDIA Vision Programming Interface library
Vulcan: 1.2.70

I already have a CUDA toolkit installed, why is conda installing CUDA again?

I have installed cuda version 11.2 and CUDNN version 8.1 in ubuntu
cnvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
When I installed tensorflow-gpu in conda environment, it is again installing cuda and cudnn.
Why is it happening.
How to stop conda from installing cuda and cudnn again?
Can I just use cuda and cudnn that I have already installed? If yes, how?
Why is it happening?
Conda expects to manage any packages you install and all their dependencies. The intention is that you literally never have to install anything else by hand for any packages they distribute in their own channel. If a GPU accelerated package requires a CUDA runtime, conda will try to select and install a correctly versioned CUDA runtime for the version of the Python package it has selected for installation.
How to stop conda from installing cuda and cudnn again?
You probably can't, or at least can't without winding up with a non-functional Tensorflow installation. But see here -- what conda installs is only the necessary, correctly versioned CUDA runtime components to make their GPU accelerated packages work. All they don't/can't install is a GPU driver for the hardware.
Can I just use cuda and cudnn that I have already installed?
You say you installed CUDA 11.2. If you look at the conda output, you can see that it wants to install a CUDA 10.2 runtime. As you are now fully aware, versioning is critical to Tensorflow and a Tensorflow build requiring CUDA 10.2 won't work with CUDA 11.2. So even if you were to stop conda from performing the dependency installation, there is a version mismatch so it wouldn't work.
If yes, how?
See above.

Can't load CUDA. Tensorflow attempting to load wrong version of cudart64

I can't get TensorFlow to recognize the 11.1 version of CUDA on my machine.
I've updated path files to the new location but whenever I run training I see it looking for an old version 'cudart64_101.dll' that no longer exists in my file system.
'Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found'
The current version of the cudart64 dll installed and in my path file is cudart64_110.dll, not the cudart64_101.dll that it is searching for.
I have a windows machine and verified the CUDA installation:nvidia-smi
Sun Nov 22 12:11:15 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.81 Driver Version: 456.81 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:54:10_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.relgpu_drvr455TC455_06.29190527_0
How do I get tensorflow to look for the correct cudart64_110.dll file?

How to install older versions of CUDA in conda environment?

I've been struggling to install an older version of CUDA in a conda environment. In practice, I need to install TensorFlow 1.14 and CUDA 8.0, with the nvcc compiler.
This is what I've tried:
conda create -n tf114-cuda8 tensorflow-gpu=1.14 cudatoolkit=8.0 python=3.6
source activate tf114-cuda8
At this point, nvcc is not available, because it is not included with default conda installation. Thus, I ran:
conda install -c conda-forge nvcc_linux-64=9.2
..but nothing! Now I have nvcc under the environment directory: ~/venvs/conda/miniconda3/envs/tf114-cuda8/, but it seems it is not the right one? In fact, if I do:
nvcc-version
it returns:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA
Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation
tools, release 10.1, V10.1.243
What has gone wrong? Why is the nvcc version the 10.1? how can I install TensorFlow 1.14 with CUDA 8.0 and nvcc included?

How to debug a segmentation fault 11 on TensorFlow?

I installed cuda 8 and the new tensorflow 1.0.
When I run "import tensorflow as tf" I get the following:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
Segmentation fault: 11
Knowing that nvcc -V gives the following:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Oct_30_22:18:43_CDT_2016
Cuda compilation tools, release 8.0, V8.0.54
Any idea how to fix this segmentation fault?
You might be missing a library in your local cuda installation. E.g., /usr/local/cuda/lib/libcuda.dylib was missing for me after trying to install CUDA Toolkit 8.0 locally (possibly because I installed the drivers first before the toolkit, as this ancient thread suggests: https://render.otoy.com/forum/viewtopic.php?f=25&t=1859). Re-running the installer for just the driver installed it properly, and also symlinked it to another name (https://github.com/tensorflow/tensorflow/issues/3263#issuecomment-232184358).
Lastly, double check your environment variable paths, see if echo $DYLD_LIBRARY_PATH looks right.
As an aside, I still saw some warnings when testing the install, e.g. The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.. These just are suggesting to build from source (https://github.com/tensorflow/tensorflow/issues/8037), rather than using pip install --upgrade tensorflow-gpu. 🍻