Running YoloV4 on wsl2 - tensorflow

I installed everything required like cuda, cudnn in wsl2 and now when I git clone AlexeyAb and make changes to the Makefile and run "make" cmd in the terminal it throws me an error.
How to verify if cudnn is installed in my wsl2?
Thanks in advance.

Related

The kernel appears to have died. It will restart automatically. Jupyter notebook [duplicate]

I am using a MacBook Pro with M1 processor, macOS version 11.0.1, Python 3.8 in PyCharm, Tensorflow version 2.4.0rc4 (also tried 2.3.0, 2.3.1, 2.4.0rc0). I am trying to run the following code:
import tensorflow
This causes the error message:
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
The code runs fine on my Windows and Linux machines.
What does the error message mean and how can I fix it?
Seems that this problem happens when you have multiple python interpreters installed, and some of them are for differente architectuers (x86_64 vs arm64). You need to make sure that the correct python interpreter is being used, if you installed Apple's version of tensorflow, then that probably requires an arm64 interpreter.
If you use rosetta (Apple's x86_64 emulator) then you need to use a x86_64 python interpreter, if you somehow load the arm64 python interpreter, you will get the illegal instruction error (which totally makes sense).
If you use any script that installs new python interpreters, then you need to make sure the correct interpreter for the architecture is installed (most likely arm64).
Overalll I think this problem happens because the python environment setup is not made for systems that can run multiple instruction sets/architectures, pip does check the architecture of packages and the host system but seems you can run a x86_64 interpreter to load a package meant for arm64 and this produces the problem.
For reference there is an issue in tensorflow_macos that people can check.
For M1 Macs, From Apple developer page the following worked:
First, download Conda Env from here and then follow these instructions (assuming the script is downloaded to ~/Downloads folder)
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
reload the shell and do
python -m pip uninstall tensorflow-macos
python -m pip uninstall tensorflow-metal
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
If the above doesn't work for some reason, there are some edge cases and additional information provided at the Apple developer page
Installing Tensorflow version 1.15 fixed this for me.
$ conda install tensorflow==1.15
I have been able to resolve this issue by using Miniforge instead of Anaconda as the Python environment. Anaconda doesn't support the arm64 architecture, yet.
I had the same issue
This is because of M1 chip. Now there is a pre-release that delivers hardware-accelerated TensorFlow and TensorFlow Addons for macOS 11.0+. Native hardware acceleration is supported on M1 Macs and Intel-based Macs through Apple’s ML Compute framework.
You need to install the TensorFlow that supports M1 chip Simply pull this tensorflow macos repository and run the ./scripts/download_and_install.sh

Install Tensorflow-GPU on WSL2

Has anyone successfully installed Tensorflow-GPU on WSL2 with NVIDIA GPUs? I have Ubuntu 18.04 on WSL2, but am struggling to get NVIDIA drivers installed. Any help would be appreciated as I'm lost.
So I have just got this running.
The steps you need to follow are here. To summarise them:
sign up for windows insider program and get the development builds of windows so that you have the latest version
Install wsl 2
Install Ubuntu from the windows store
Install the wsl 2 cuda driver on windows
Install cuda toolkit
Install cudnn (you can download the linux version from windows and then copy the file to linux)
If you are getting memory errors like 'cannot allocate memory' then you might need to increase the amount of memory wsl can get
Then install tensorflow-gpu
pray it works
bugs I hit along the way:
If when you open ubuntu for the first time you get an error you need to enable virutalisation in the bios
If you cannot run the ./Blackscholes example in the installation instructions you might not have the right build of windows! You must have the right version
if you are getting 'cannot allocate memory' errors when running tf you need to give wsl more ram. It only access half your ram by default
create a .wslconfig file under your user directory in windows with the amount of memory you want. Mine looks like:
[wsl2]
memory=16GB
Edit after running some code
This is much slower then when I was running on windows directly. I went from 1 minute per epoch to 5 minutes. I'm just going to dualboot.
These are the steps I had to follow for Ubuntu 20.04. I am no longer on dev channel, beta channel works fine for this use case and is much more stable.
Install WSL2
Install Ubuntu 20.04 from Windows Store
Install Nvidia Drivers for Windows from: https://developer.nvidia.com/cuda/wsl/download
Install nvcc inside of WSL with:
sudo apt install nvidia-cuda-toolkit
Check that it is there with:
nvcc --version
For my use case, I do data science and already had anaconda installed. I created an environment with:
conda create --name tensorflow
conda install tensorflow-gpu
Then just test it with this little python program with the environment activated:
import tensorflow as tf
tf.config.list_physical_devices('GPU')
sys_details = tf.sysconfig.get_build_info()
cuda = sys_details["cuda_version"]
cudnn = sys_details["cudnn_version"]
print(cuda, cudnn)
For reasons I do not understand, my machine was unable to find the GPU without installing the nvcc and actually gave an error message saying it could not find nvcc.
Online tutorials I had found which had you downloading CUDA and CUDNN separately but I thinkNVCC includes CUDNN since it is . . . there somehow.
I can confirm I am able to get this working without the need for Docker on WSL2 thanks to the following article:
https://qiita.com/Navier/items/cf551908bae707db4258
Be sure to update to driver version 460.15, not 455.41 as listed in the CUDA documentation.
Note, this does not work with the card in TCC mode (only WDDM). Also, be sure to place your files on the Linux file system (i.e. not on a mount drive, like /mnt/c/). Performance is significantly faster on the Linux file system (this has to do with the difference in implementation of WSL 1 vs. WSL 2; see 1, 2, and 3).
NOTE: See also Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?
I just want to point out that using anaconda to install cudatoolkit and cudnn does not seem to work in wsl.
Maybe there is some problem with paths that make TF look for the needed files only in the system paths instead of the conda enviroments.

E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

I am using rasa 1.9.6 on ubuntu in Vmware I have been getting this error in both training as well as running the model. It allows training the model but I am unable to run it I need to run my Bot can someone please help
According to rasa forum, the origin of this issue is due to tensorflow and graphics card configuration. GPU’s do not typically provide an advantage for the Rasa models. This can be safely ignored
Installing nvidia-modprobe can solve this issue.
sudo apt install nvidia-modprobe
Other solutions you can try are :
Uninstall and install CUDA and cuDNN.
Install tensorflow-gpu.
Uninstall and install different Nvidia driver versions.
The problem also could be that only some /dev/nvidia* files are present before running Python with sudo, check using $ ls /dev/nvidia*, after running the Device Node verification script the /dev/nvidia-uvm file gets added.

Tensorflow GPU import error

I have CUDA 8.0, and I can download cuDNN. Currently, I have cuDNN version 7.0.5 for Linux.
I do not have administrator privileges.
When I tried to install TensorFlow version 1.4 for GPU, I got this error:
ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory
I figured this was due to the absence of cuDNN on my machine. I downloaded version 7.0.5, at the advice of the sysadmin, which is of course not the version the error message wanted me to get (it wanted version 6).
So I thought, I'll try Tensorflow version 1.5 for GPU. I got this error:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
What should I do? Is there a way to download older versions of cuDNN? Or a way to download cublas 9.0 somewhere?
Yes if you register at nvidia you can also download older versions of cuDNN. It‘s a little hidden though. Make sure you download the right version which is compatible to your cuda version. Also don‘t forget to set CUDA_HOME environment variable for tensorflow to find your GPU.
This is what ended up working for me:
Steps to install tensorflow-gpu on a remote machine via a local machine.
1) SSH into the remote machine via something like: ssh -X username#remote
2) Use pip install tensorflow-gpu for a first-time install of tensorflow-gpu. This will give you the most current version. If you want an older version, you can specify it with pip install tensorflow-gpu==1.4.0 (for example)
3) If you get an error, you likely either need to install CUDA or cuDNN.
To check your CUDA version:
* cd /usr/local/cuda
* vim version.txt
To download cuDNN:
Go to https://developer.nvidia.com/cudnn
Sign up for a free developer account (you will be prompted to do this via the ‘Download’ button)
Once you’ve created an account and logged in, click the box next to “I agree to the terms of the cuDNN software licence agreement”. A list of possible cuDNN versions for download will appear.
The error message from the terminal will tell you which version of cuDNN you need. For example, “libcudnn.so.6” in the error message means it’s looking for cuDNN version 6.
Click Download cuDNN v6.0 (April 27, 2017), for CUDA 8.0 (note your CUDA version must be aligned with your cuDNN version - you may not be able to use cuDNN version 6.0 with CUDA version 9.0, for example).
Click cuDNN v6.0 Library for Linux (if you have a Linux machine, and you are not trying to install cuDNN for an entire system). A download of a zipped folder will be started.
Unzip the folder and save it on your Desktop. Call the folder ‘cuda’.
Secure copy the folder and all its contents to the cluster. For example:
scp -r /Users/username/Desktop/cuda username#remote:~/path/to/a/folder/you/use
SSH into the remote server via ssh -X username#remote
Copy (or move) the cuda folder via something like: cp -r cuda /path/to/where/you/want/cuda
cd /path/to/where/you/want/cuda
echo "export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:path/to/where/you/want/cuda/lib64"
>>$HOME/.bashrc
Restart your terminal window
SSH into the remote again and try importing tensor flow in Python. If it succeeds, great! If not, preload the library path before starting python and that should work. You can do this with:
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/where/you/want/cuda/lib64 python

Tensorflow with gpu support installation error - the specified --crosstool_top is not a valid cc_toolchain_suite rule

I've been trying to install tensorflow with GPU support using these steps:
http://www.nvidia.com/object/gpu-accelerated-applications-tensorflow-installation.html
and also using:
http://thelazylog.com/install-tensorflow-with-gpu-support-on-sandbox-redhat/
This is the error message that I'm getting when I try to run the bazel build command for building the tensorflow pip package (with the --config-cuda flag set):
The specified --crosstool_top '//third_party/gpus/crosstool:crosstool' is not a valid cc_toolchain_suite rule.
What's strange is that if i remove the --config=cuda flag, I don't get the error message while building and I'm able to install tensorflow successfully - but without GPU support.
I experienced the same issue, using the nvidia instructions. What I did was to drop the git reset line in the instructions, and it works.
Details (from the error message):
Close, reopen terminal
Run git clone (again), and cd tensorflow
Run ./configure
Bazel build, etc
This may be unrelated, but I experienced an issue with the .whl line, the error message was that the wheel cannot be found or something along those lines. This is the "And finally install the TensorFlow pip package" section. To resolve it in my case, I typed in the terminal all the way to "..._pkg/tensorflow", and then pressed tab for auto-completion. The file name that popped up was significantly longer than that in the guide, but it worked. Also, if anyone face a numpy not installed message based on the nvidia instructions, replace the python-pip and dev with python-numpy and run that line again to install.
Configuration: Fresh Ubuntu 16.04, GTX970M, running driver 367.48 (from CUDA installation), CUDA 8.0, CuDNN 5.1
Full setup path:
Fresh Ubuntu, with downloads and 3rd party apps selected during installation.
Control panel => Software and updates => Other Software => Canonical ticked
Install CUDA using nvidia instructions in CUDA documentation, .deb format
CuDNN 5.1 installed, the rest from the nvidia link.
I hope everything works out for you!
(I'm sorry for the poor formatting)
I was going through same problem and recently found the solution. The problem is with the installation of Bazel which leads to this kind of error.
After installation of bazel from installer, make sure that you would give the correct path to ~./bashrc and also activate the path using
source "path-to-your-bin-directory-for-bazel"
Please change the git source version slightly as shown below
$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
// $ git reset --hard 70de76e
$ git reset --hard 287db3a
And please refer the below l
https://github.com/tensorflow/tensorflow/issues/4944
Also, zlib has been updated since this TF build. You need to check http://www.zlib.net/ to get the latest version and SHA-256, then update tensorflow/workspace.bzl with that information (lines 254-266 in this build). At this time, the correct version info would include the following:
url = "http://zlib.net/zlib-1.2.11.tar.gz",
sha256 = "c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1",
strip_prefix = "zlib-1.2.11",