Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.6.0; Ubuntu 18.04 - tensorflow

I am trying to address the issue in the title:
Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version
I have read several other posts (example: Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100))
that basically tells me that my machine has CuDNN 7.1.2 but I need 7.6.0. The answer is then to download and install 7.6.*
the only issue is that I thought I did that by following the instructions on nvidia archive (https://developer.nvidia.com/rdp/cudnn-archive)
and if I go to /usr/local/cuda/include and read cudnn.h it shows
#if !defined(CUDNN_H_)
#define CUDNN_H_
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 4
Currently I have CUDA-10.0, 10.1, and 10.2 installed with my .bashrc set to 10.0 (although nvcc --version states I have cuda 9.1 --another issue I cant seem to fix).
Any suggestions? I have been trying to tackle this for days but no luck.
UPDATE:
Here are the paths I have
export PATH=$PATH:/usr/local/cuda-10.0/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64
export CUDA_HOME=/usr/local/cuda
Before this is closed could you help with either suggesting a proper path to set or to find old cudnn please?

I hit a very similar error:
Loaded runtime CuDNN library: 7.1.4 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
and tracked it down to accidentally having an older CuDNN in my ldconfig:
$ sudo ldconfig -p | grep libcudnn
libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.0/lib64/libcudnn.so.7
libcudnn.so.7 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn.so.7
libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.0/lib64/libcudnn.so
libcudnn.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn.so
The libcudnn.so.7 file in the cuda-9.0 directory was pointing to the older version:
ls -alh /usr/local/cuda-9.0/lib64/libcudnn.so.7
lrwxrwxrwx 1 root root 17 Dec 16 2018 /usr/local/cuda-9.0/lib64/libcudnn.so.7 -> libcudnn.so.7.1.4
But I had compiled tensorflow against the newer version:
ls -alh /usr/lib/x86_64-linux-gnu/libcudnn.so.7
lrwxrwxrwx 1 root root 17 Oct 27 2019 /usr/lib/x86_64-linux-gnu/libcudnn.so.7 -> libcudnn.so.7.6.5
Since ldconfig uses /etc/ld.so.conf to determine where to look for libraries (I guess in conjunction with LD_LIBRARY_PATH), I checked it and it showed:
include /etc/ld.so.conf.d/*.conf
When I listed the files in that directory, I spotted the problem file and removed it:
$ cat /etc/ld.so.conf.d/cuda9.conf
/usr/local/cuda-9.0/lib64
$ sudo rm /etc/ld.so.conf.d/cuda9.conf
After that I re-ran ldconfig to reload the config, and then everything worked as expected and the error disappeared.

Related

Intel OneAPI 2022 - libimf.so No such file or directory - during openMPI compilation

trying to compile openmpi with intel oneapi 2022.0.1 compilers
OS is 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I got the intel compilers as follows (just to make sure I didn't mess anything up at that step)
sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"
sudo apt install intel-basekit
sudo apt install intel-hpckit
Configuring openmpi with:
./configure --prefix=${HPCX_HOME}/ompi-icc CC=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/icc CXX=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/icpc F77=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/ifort FC=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/ifort --with-ucx=/usr --with-platform=contrib/platform/mellanox/optimized
my .bashrc has (root has the same .bashrc)
source /opt/intel/oneapi/setvars.sh
export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/2022.0.1/linux/compiler/lib/intel64_lin
After configure I do : sudo make all install and get the following error:
ld: /opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/../../bin/intel64/../../lib/icx-lto.so: error loading plugin: libimf.so: cannot open shared object file: No such file or directory
There is no ifortvars.sh with this new version of oneAPI which seems to have solved similar issues for others in the past.
libimf.so is in:
/opt/intel/oneapi/itac/2021.5.0/bin/rtlib/libimf.so
/opt/intel/oneapi/compiler/2022.0.1/linux/compiler/lib/intel64_lin/libimf.so
/opt/intel/oneapi/intelpython/python3.9/pkgs/intel-cmplr-lib-rt-2022.0.1-intel_3633/lib/libimf.so
/opt/intel/oneapi/intelpython/python3.9/lib/libimf.so
/opt/intel/oneapi/intelpython/python3.9/envs/2022.0.1/lib/libimf.so
Any help and/or advice regarding compiling openmpi with recent intel compilers would be appreciated.
Here is the solution I found but doubt that this is the most elegant way of doing this:
OS is 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
my .bashrc has (root has the same .bashrc)
source /opt/intel/oneapi/setvars.sh
created intel_libs.conf in
/etc/ld.so.conf.d/ and added the line /opt/intel/oneapi/compiler/2022.0.1/linux/compiler/lib/intel64_lin this is where the libimf.so lives.
sudo ldconfig
compiled openmpi with intel compilers fine after that using:
./configure --prefix={HPCX_HOME}/ompi-icc CC=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/icc CXX=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/icpc F77=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/ifort FC=/opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/ifort --with-ucx=/usr --with-platform=contrib/platform/mellanox/optimized
sudo make all
sudo make install
I hope this helps someone else and please let me know if there is a better way of doing this. Cheers
I am also facing similar issue not exactly. I installed the new ifort using the following command lines:
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18236/l_BaseKit_p_2021.4.0.3422.sh
sudo bash l_BaseKit_p_2021.4.0.3422.sh
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18211/l_HPCKit_p_2021.4.0.3347.sh
sudo bash l_HPCKit_p_2021.4.0.3347.sh
There is no file called setvars.sh inside the source /opt/intel/oneapi/, also while compiling any mpiifort file it is throwing me an error saying:
error loading plugin: libimf.so: cannot open shared object file: No such file or directory
Not even sure, if this is related to this thread or not, but any further guidance would be very helpful, thanks in advance.

ValueError: invalid literal for int() with base 10: '' while building tensorflow from source with gpu support [duplicate]

When I install tensorflow-gpu through Conda; it gives me the following output:
conda install tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/psychotechnopath/anaconda3/envs/DeepLearning3.6
added / updated specs:
- tensorflow-gpu
The following packages will be downloaded:
package | build
---------------------------|-----------------
_tflow_select-2.1.0 | gpu 2 KB
cudatoolkit-10.1.243 | h6bb024c_0 347.4 MB
cudnn-7.6.5 | cuda10.1_0 179.9 MB
cupti-10.1.168 | 0 1.4 MB
tensorflow-2.1.0 |gpu_py36h2e5cdaa_0 4 KB
tensorflow-base-2.1.0 |gpu_py36h6c5654b_0 155.9 MB
tensorflow-gpu-2.1.0 | h0d30ee6_0 3 KB
------------------------------------------------------------
Total: 684.7 MB
The following NEW packages will be INSTALLED:
cudatoolkit pkgs/main/linux-64::cudatoolkit-10.1.243-h6bb024c_0
cudnn pkgs/main/linux-64::cudnn-7.6.5-cuda10.1_0
cupti pkgs/main/linux-64::cupti-10.1.168-0
tensorflow-gpu pkgs/main/linux-64::tensorflow-gpu-2.1.0-h0d30ee6_0
I see that installing tensorflow-gpu automatically triggers the installation of the cudatoolkit and cudnn. Does this mean that I no longer need to install CUDA and CUDNN manually anymore to be able to use tensorflow-gpu? Where does this conda installation of CUDA reside?
I first installed CUDA and CuDNN the old way (e.g. by following these installation instructions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html )
And then I noticed that tensorflow-gpu was also installing cuda and cudnn
Do i now have two versions of CUDA/CuDNN installed and how do I check this?
Do i now have two versions of CUDA installed and how do I check this?
No.
conda installs the bare minimum redistributable library components required to support the CUDA accelerated packages they offer. The package name cudatoolkit is a complete misnomer. It is nothing of the sort. Even though it is now greatly expanded in scope from what it used to be (literally 5 files -- I think at some point they must have gotten a licensing deal from NVIDIA because some of this wasn't/isn't on the official "freely redistributable" list AFAIK), it still is basically just a handful of libraries.
You can check this for yourself:
cat /opt/miniconda3/conda-meta/cudatoolkit-10.1.168-0.json
{
"build": "0",
"build_number": 0,
"channel": "https://repo.anaconda.com/pkgs/main/linux-64",
"constrains": [],
"depends": [],
"extracted_package_dir": "/opt/miniconda3/pkgs/cudatoolkit-10.1.168-0",
"features": "",
"files": [
"lib/cudatoolkit_config.yaml",
"lib/libcublas.so",
"lib/libcublas.so.10",
"lib/libcublas.so.10.2.0.168",
"lib/libcublasLt.so",
"lib/libcublasLt.so.10",
"lib/libcublasLt.so.10.2.0.168",
"lib/libcudart.so",
"lib/libcudart.so.10.1",
"lib/libcudart.so.10.1.168",
"lib/libcufft.so",
"lib/libcufft.so.10",
"lib/libcufft.so.10.1.168",
"lib/libcufftw.so",
"lib/libcufftw.so.10",
"lib/libcufftw.so.10.1.168",
"lib/libcurand.so",
"lib/libcurand.so.10",
"lib/libcurand.so.10.1.168",
"lib/libcusolver.so",
"lib/libcusolver.so.10",
"lib/libcusolver.so.10.1.168",
"lib/libcusparse.so",
"lib/libcusparse.so.10",
"lib/libcusparse.so.10.1.168",
"lib/libdevice.10.bc",
"lib/libnppc.so",
"lib/libnppc.so.10",
"lib/libnppc.so.10.1.168",
"lib/libnppial.so",
"lib/libnppial.so.10",
"lib/libnppial.so.10.1.168",
"lib/libnppicc.so",
"lib/libnppicc.so.10",
"lib/libnppicc.so.10.1.168",
"lib/libnppicom.so",
"lib/libnppicom.so.10",
"lib/libnppicom.so.10.1.168",
"lib/libnppidei.so",
"lib/libnppidei.so.10",
"lib/libnppidei.so.10.1.168",
"lib/libnppif.so",
"lib/libnppif.so.10",
"lib/libnppif.so.10.1.168",
"lib/libnppig.so",
"lib/libnppig.so.10",
"lib/libnppig.so.10.1.168",
"lib/libnppim.so",
"lib/libnppim.so.10",
"lib/libnppim.so.10.1.168",
"lib/libnppist.so",
"lib/libnppist.so.10",
"lib/libnppist.so.10.1.168",
"lib/libnppisu.so",
"lib/libnppisu.so.10",
"lib/libnppisu.so.10.1.168",
"lib/libnppitc.so",
"lib/libnppitc.so.10",
"lib/libnppitc.so.10.1.168",
"lib/libnpps.so",
"lib/libnpps.so.10",
"lib/libnpps.so.10.1.168",
"lib/libnvToolsExt.so",
"lib/libnvToolsExt.so.1",
"lib/libnvToolsExt.so.1.0.0",
"lib/libnvblas.so",
"lib/libnvblas.so.10",
"lib/libnvblas.so.10.2.0.168",
"lib/libnvgraph.so",
"lib/libnvgraph.so.10",
"lib/libnvgraph.so.10.1.168",
"lib/libnvjpeg.so",
"lib/libnvjpeg.so.10",
"lib/libnvjpeg.so.10.1.168",
"lib/libnvrtc-builtins.so",
"lib/libnvrtc-builtins.so.10.1",
"lib/libnvrtc-builtins.so.10.1.168",
"lib/libnvrtc.so",
"lib/libnvrtc.so.10.1",
"lib/libnvrtc.so.10.1.168",
"lib/libnvvm.so",
"lib/libnvvm.so.3",
"lib/libnvvm.so.3.3.0"
]
.....
i.e. what you get is (keeping in mind most of those "files" above are just symlinks)
CUBLAS runtime
The CUDA runtime library
CUFFT runtime
CUrand runtime
CUsparse rutime
CUsolver runtime
NPP runtime
nvblas runtime
NVTX runtime
NVgraph runtime
NVjpeg runtime
NVRTC/NVVM runtime
The CUDNN package that conda installs is the redistributable binary distribution which is identical to what NVIDIA distribute -- which is exactly two files, a header file and a library.
You would still require a supported NVIDIA driver installation to make the tensorflow which conda installs work.
If you want to actually compile and build CUDA code, you need to install a separate CUDA toolkit which contains all the the development components which conda deliberately omits from their distribution.

Installing Rakudo on Android with ARM processor architecture

I am trying to install Rakudo on my Android with armv7l processor architecture using Termux.
I tried compiling from source, but it didn't work. Then someone pointed out the Termux user its-pointless and his package for this, but that package does not work on my phone.
How can I run Raku on my phone while it is offline? I'm open to solutions not using Termux.
Termux on SSH results:
u0_a74#localhost ~/rakudo [100]> pkg show rakudo -a
Package: rakudo Version: 2020.05 Maintainer: Termux members #termux
Installed-Size: 37.7 MB Depends: moarvm Homepage: https://rakudo.org
Download-Size: 5062 kB APT-Manual-Installed: yes APT-Sources:
https://its-pointless.github.io/files/24 termux/extras arm Packages
Description: Perl 6 implementation on top of Moar virtual machine
Package: rakudo Version: 2020.01-1 Maintainer: Fredrik Fornwall
#fornwall Installed-Size: 93.1 MB Depends: moarvm Homepage:
https://rakudo.org Download-Size: 10.9 MB APT-Sources:
https://its-pointless.github.io/files/24 termux/extras arm Packages
Description: Perl 6 implementation on top of Moar virtual machine
u0_a74#localhost ~/rakudo> raku
CANNOT LINK EXECUTABLE "raku": cannot locate symbol "ffi_type_double"
referenced by "/data/data/com.termux/files/usr/lib/libmoar.so"...
u0_a74#localhost ~/rakudo> raku --version
CANNOT LINK EXECUTABLE "raku": cannot locate symbol "ffi_type_double"
referenced by "/data/data/com.termux/files/usr/lib/libmoar.so"...
u0_a74#localhost ~/rakudo> raku --help
CANNOT LINK EXECUTABLE "raku": cannot locate symbol "ffi_type_double"
referenced by "/data/data/com.termux/files/usr/lib/libmoar.so"...
u0_a74#localhost ~/rakudo> uname -a
Linux localhost 3.4.42-g3d041de #1 SMP PREEMPT Sat Dec 24 19:56:29 PST
2016 armv7l Android
Does it have to be on Termux? I have successfully installed Raku on Android via UserLand, using Debian SSH. sudo apt-get install rakudo works.

How can I downgrade cuDNN from v8.0 to v6.0 in Google Colab?

This follows on from this question: How to downgrade Cuda and cuDNN Version in Google Colab?
I've followed the instructions and have first downgraded Cuda 10.0 to Cuda 8.0. I've then downloaded cuDNN v6.0 to my local drive and uploaded to my Google Drive which I have mounted. I then install with the following code:
!dpkg -i "~/Downloads/libcudnn6_6.0.21-1+cuda8.0_amd64.deb"
!ls -l /usr/lib/x86_64-linux-gnu/libcudnn.so.6*
I get a message in the log as follows:
/sbin/ldconfig.real: /usr/local/lib/python3.6/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link
It does appear to have been installed:
lrwxrwxrwx 1 root root 18 Apr 12 2017 /usr/lib/x86_64-linux-gnu/libcudnn.so.6 -> libcudnn.so.6.0.21
-rw-r--r-- 1 root root 154322864 Apr 12 2017 /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
When I check where cuDNN is installed I get:
cudnn: /usr/include/cudnn.h
Checking the version I get:
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
However, when I try to use my model I receive:
no CUDA-capable device is detected
I'm confused. Have I installed the correct version of cuDNN properly?

OpenSSL header does not match current version when compiling

When trying to compile freeRADIUS server 2.2.3, I've gotten the following error message:
checking for OpenSSL version >= 0.9.7... yes
checking OpenSSL library and header version consistency... library: 90819f header: 90812f... no
configure: error: in `/Users/tyrexionibus/Downloads/freeradius-server-2.2.3':
configure: error: OpenSSL library version does not match header version
Actually, the openSSL version is:
tyrexionibus$ openssl version
OpenSSL 1.0.1f 6 Jan 2014
And the header, located in /usr/include/openssl/opensslv.h contains:
#define OPENSSL_VERSION_NUMBER 0x0090819fL
Editing it doesn't solve the issue.
How do I solve this?
try to remove libssl-dev and install libssl1.0-dev
sudo apt remove libssl-dev
sudo apt install libssl1.0-dev
That worked for me
The problem is that often the compiler and linker search paths aren't consistent.
By default (unless modified with -isystem or -I) the GCC search path is:
/usr/local/include
libdir/gcc/target/version/include
/usr/target/include
/usr/include
By default (unless modified with -L) Apple's linker's search path is:
/usr/lib
/usr/local/lib
and by default (at least with 2.23.52.20130913 on Ubuntu 13.04) (unless modified with -L) the GNU linker's search path is:
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu/mesa:
/lib:
/usr/lib:
The linker and the compiler may pick up completely different versions of the library's headers and binaries when multiple versions are installed on the system. The compiler may then emit code which is incompatible with the library's ABI, with undefined and usually undesirable behaviour. This is why the check was added.
To ensure consistency you should pass the --with-openssl-includes= and --with-openssl-libraries= flags to the configure script. These directories will then be searched first by the compiler and linker.
./configure --with-openssl-includes=/usr/include --with-openssl-libraries=/usr/lib
will result in the bundled or packaged OpenSSL libraries/headers being used on most systems.
Another option is to set LD_LIBRARY_PATH at configure time, though you'll also need to set this in your init scripts, else the runtime version check (yes, we were thorough) will fail.
In OSX 10.10 (Yosemite), I have to custom install openssl using brew.
$ brew update
$ brew install openssl
$ brew link --force openssl
Verify the version.
$ openssl version
OpenSSL 1.0.2 22 Jan 2015
I can see which library it is linked to.
$ otool -L /usr/local/bin/openssl
/usr/local/bin/openssl:
/usr/local/Cellar/openssl/1.0.2/lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/local/Cellar/openssl/1.0.2/lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)
In my configure script I can then specify the OpenSSL path.
$ ./configure --with-ssl-dir=/usr/local/Cellar/openssl/1.0.2
You should probably check config.log generated by configure (in the same folder): it seems like you have at least 2 or even 3 versions of OpenSSL: 0.9.8r, 0.9.8y and 1.0.1f.
Explanation:
1) OPENSSL_VERSION_NUMBER = 0x0090819f in /usr/include/openssl/opensslv.h means 0.9.8y is installed into /usr;
2) output of command openssl version suggests that you have 1.0.1f somewhere in your PATH, but 1.0.1f defines OPENSSL_VERSION_NUMBER as 0x1000106fL, not 0x0090819f, so it's a different copy from the 1) above.
tyrexionibus$ openssl version
OpenSSL 1.0.1f 6 Jan 2014
3) 90812f in the output of configure means 0.9.8r.
You may also find an OpenSSL version matrix useful to match version numbers in hex (from opensslv.h) with human-readable version strings.
Just don't forget to add line into /etc/ld.so.conf
/usr/local/ssl/lib
and run ldconfig.
Without this step your libssl uses system lybcrypto instead of yours one.
Compare
BEFORE:
=======
[root]/usr/local/ssl/lib> ldd ./libssl.so
linux-vdso.so.1 (0x00007ffe4c93f000)
libcrypto.so.1.0.0 => /lib64/libcrypto.so.1.0.0 (0x00007febbdcd3000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007febbdacf000)
libc.so.6 => /lib64/libc.so.6 (0x00007febbd702000)
/lib64/ld-linux-x86-64.so.2 (0x0000558484e69000)
AFTER:
======
[root]/usr/local/ssl/lib> ldd ./libssl.so
linux-vdso.so.1 (0x00007ffcec2aa000)
libcrypto.so.1.0.0 => /usr/local/ssl/lib/libcrypto.so.1.0.0 (0x00007fa347db5000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fa347bb0000)
libc.so.6 => /lib64/libc.so.6 (0x00007fa3477e4000)
/lib64/ld-linux-x86-64.so.2 (0x00005567459c7000)
I was really happy when discovered this :=)
try this
./configure --with-openssl-lib-dir=/usr/local/openssl/lib/ --with-openssl-include-dir=/usr/local/openssl/include/
This fixed it for me...
sudo apt-get install libssl-dev