I don't have sudo access to the remote pc where cuda is already installed. Now, I have to install tensorflow-gpu on that system. Please give me the step by step guide to install it without sudo.
Operating System : Ubuntu 18.04
I had to do this before. Basically, I installed miniconda (you can also use anaconda, same thing and installation works without sudo), and installed everything using conda.
Create my environment and activate it:
conda create --name myenv python=3.6.8
conda actiavate myenv
Install the CUDA things and Tensorflow
conda install cudatoolkit=9.0 cudnn=7.1.2 tensorflow-gpu
Depending on your system, you may need to change version numbers.
Not sure how familiar you are with conda - it is basically a package-manager/repository and environment manager like pip/venv with the addition that it can handle non-python things as well (such as cudnn for example). As a note - if a package is not availabe through conda, you can still use pip as a fallback.
Untested with pip
I previously tried to do it without conda and using pip (I ended up failing due to some version conflicts, got frustrated with the process and moved to conda). It gets a little more complicated since you need to manually install it. So first, download cudnn from nvidia and unpack it anywhere you want. Then, you need to add it to the LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/path/to/cuda/lib64:/path/to/cudnn/lib64/:${LD_ LIBRARY_PATH}
Related
I'm trying to use my GPU (Nvidia GeForce) to run my python code in spyder, and I don't know the way to do it.
any help please !
Most cases you need these things:
Be sure that your GPU is compatible from:
https://developer.nvidia.com/cuda-gpus
Download Cuda ToolKit:
https://developer.nvidia.com/cuda-downloads
You can check by nvidia-smi command if it works
Install conda install cudatoolkit
You may also need conda install numba depending on your code
Also checkout: https://www.anaconda.com/blog/getting-started-with-gpu-computing-in-anaconda
And if you don't use anaconda you can download packages via pip checkout: https://documen.tician.de/pycuda/
Has anyone successfully installed Tensorflow-GPU on WSL2 with NVIDIA GPUs? I have Ubuntu 18.04 on WSL2, but am struggling to get NVIDIA drivers installed. Any help would be appreciated as I'm lost.
So I have just got this running.
The steps you need to follow are here. To summarise them:
sign up for windows insider program and get the development builds of windows so that you have the latest version
Install wsl 2
Install Ubuntu from the windows store
Install the wsl 2 cuda driver on windows
Install cuda toolkit
Install cudnn (you can download the linux version from windows and then copy the file to linux)
If you are getting memory errors like 'cannot allocate memory' then you might need to increase the amount of memory wsl can get
Then install tensorflow-gpu
pray it works
bugs I hit along the way:
If when you open ubuntu for the first time you get an error you need to enable virutalisation in the bios
If you cannot run the ./Blackscholes example in the installation instructions you might not have the right build of windows! You must have the right version
if you are getting 'cannot allocate memory' errors when running tf you need to give wsl more ram. It only access half your ram by default
create a .wslconfig file under your user directory in windows with the amount of memory you want. Mine looks like:
[wsl2]
memory=16GB
Edit after running some code
This is much slower then when I was running on windows directly. I went from 1 minute per epoch to 5 minutes. I'm just going to dualboot.
These are the steps I had to follow for Ubuntu 20.04. I am no longer on dev channel, beta channel works fine for this use case and is much more stable.
Install WSL2
Install Ubuntu 20.04 from Windows Store
Install Nvidia Drivers for Windows from: https://developer.nvidia.com/cuda/wsl/download
Install nvcc inside of WSL with:
sudo apt install nvidia-cuda-toolkit
Check that it is there with:
nvcc --version
For my use case, I do data science and already had anaconda installed. I created an environment with:
conda create --name tensorflow
conda install tensorflow-gpu
Then just test it with this little python program with the environment activated:
import tensorflow as tf
tf.config.list_physical_devices('GPU')
sys_details = tf.sysconfig.get_build_info()
cuda = sys_details["cuda_version"]
cudnn = sys_details["cudnn_version"]
print(cuda, cudnn)
For reasons I do not understand, my machine was unable to find the GPU without installing the nvcc and actually gave an error message saying it could not find nvcc.
Online tutorials I had found which had you downloading CUDA and CUDNN separately but I thinkNVCC includes CUDNN since it is . . . there somehow.
I can confirm I am able to get this working without the need for Docker on WSL2 thanks to the following article:
https://qiita.com/Navier/items/cf551908bae707db4258
Be sure to update to driver version 460.15, not 455.41 as listed in the CUDA documentation.
Note, this does not work with the card in TCC mode (only WDDM). Also, be sure to place your files on the Linux file system (i.e. not on a mount drive, like /mnt/c/). Performance is significantly faster on the Linux file system (this has to do with the difference in implementation of WSL 1 vs. WSL 2; see 1, 2, and 3).
NOTE: See also Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?
I just want to point out that using anaconda to install cudatoolkit and cudnn does not seem to work in wsl.
Maybe there is some problem with paths that make TF look for the needed files only in the system paths instead of the conda enviroments.
I am building a Deep Learning rig with a GeForce RTX 2060.
I am wanting to use baselines-stable which isn't tensorflow 2.0 compatible yet.
According to here and here, tensorflow-gpu-1.15 is only listed as compatible with CUDA 10.0, not CUDA 10.1.
Attempting to download CUDA from Nvidia, the option for Ubuntu 20.04 is not available for CUDA 10.0.
Searching the apt-cache does not result in CUDA 10.0 either.
$ sudo apt-cache policy nvidia-cuda-toolkit
[sudo] password for lansford:
nvidia-cuda-toolkit:
Installed: (none)
Candidate: 10.1.243-3
Version table:
10.1.243-3 500
500 http://us.archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages
I would highly prefer not to have to reinstall the OS with an older version of Ubuntu. However experimenting with reinforcement learning was the motive for purchasing this PC.
I see some possible clues that it might be possible to build tensorflow-gpu-1.15 from source with cuda 10.1 support. I also saw a random comment that tensorflow-gpu-1.15 will just-work with tf 1.15, but I am not wanting to make a miss-step installing things until I have a signal that is the direction to go. Uninstalling things isn't always straightforward.
Should I install CUDA 10.1 and cross my fingers 1.15 will like it.
Should I download the install for CUDA 10.0 for a the older Ubuntu version and see if it will install anyway
Should I attempt to compile tensorflow from source against CUDA 10.1 (heh heh heh)
Should I install and older version of Ubuntu and hope I don't go obsolete too quickly.
Given the situation is there a way to run tensorflow 1.15 with gpu support on Ubuntu 20.04.1?
As this also bothered me I found a working solution that I think is more versatile than using docker containers.
The main idea is from here (not to claim credit from others).
To make a working solution for Ubuntu 20.04 and TensorFlow 1.15 one needs:
Cuda 10.0 (to work with tf 1.15).
I have some trouble finding this version because it's not officially available for Ubuntu 20.04. I resolved to the Ubuntu 18.04 version though which works fine.
Archive toolkits here.
Final toolkit for Ubuntu here (as it's obvious not 20.04 version is available).
I chose runfile as method which resulted into 1 main runfile and 1 patch runfile being available:
cuda_10.0.130_410.48_linux.run
cuda_10.0.130.1_linux.run
The toolkit can be safely installed using the instructions provided with no risk since each version allocates a different folder in the system (typically this would be /usr/local/cuda-10.0/).
The corresponding cudnn for cuda 10.0
I had this one from a previous installation but its shouldn't be hard to download it also. The version I used is cudnn-10.0-linux-x64-v7.6.5.32.tgz.
Cudnn basically just copies files in the right places (do not actually install anything that is). So, an extraction of the compressed file and copy to the folder would suffice:
$ sudo cp cuda/include/cudnn.h /usr/local/cuda-10.0/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64
$ sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda-10.0/lib64/libcudnn*
Upto this point although installed the system is unaware of the presence of cuda 10.0. So, all call to it will fail as if non existent. We should update the relevant system environment for cuda 10.0. One way (there are others) system-wide is to create (in not existent) a /etc/profile.d/cuda.sh which will contain the update to the LD_LIBRARY_PATH variable. It should contain something like:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-11.3/lib64:/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
This command would normally do the work:
$ sudo sh -c ‘echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-11.3/lib64:/usr/local/cuda-10.0/lib64:\$LD_LIBRARY_PATH > /etc/profile.d/cuda.sh’
This requires a restart though to be evaluated I think. Anyway, this way the system will search for the relevant so files in:
a) /usr/local/cuda/lib64 (the default symbolic link) and it will fail
b) to the virtually same as the latter /usr/local/cuda-11.3/lib64 and also fail BUT it will search also
c) /usr/local/cuda-10.0/lib64 which will be successful.
The supported versions of python for cuda 10.0 ends with 3.7 so an older version should be installed. This means obligatory a virtual environment (since messing with system python is never not a good idea).
One can install python 3.7 for example using this repository which contains old (and new versions of python):
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install python3.7
This just installs python3.7 to the system it does not make it default. The default is the previous one.
Create a virtual environment and add the desired python as the default interpreter. For me this works:
virtualenv -p python3.7 ~/tensorflow_1-15
which creates a new venv with Python 3.7 in it.
Now populate with all required modules and you are set to go.
I went ahead and went with the docker approach. The Tensorflow documentation seems to be pushing in that direction anyway. Using docker only the Nvidia driver needs to be installed. You do need to have nvidia support installed in docker for it to work.
This contains the CUDA environment with the Tensorflow version so I can work with 1.15 and with the latest 2.x versions of Tensorflow on the same computer which require different CUDA versions.
It doesn't install anything besides docker stuff to get messy on the computer and difficult to pull back out.
I can still install Tensorflow natively on the computer at some point in the future when the libraries become availabe without compiling from source.
Here is the command which launches jupyter and mounts the current directory from my computer to /tf/bob which shows up in jupyter.
docker run -it --mount type=bind,source="$(pwd)",target=/tf/bob -u $(id -u):$(id -g) -p 8888:8888 tensorflow/tensorflow:1.15.2-gpu-py3-jupyter
previously, I installed the tensorflow 1.13 in my machine.
There are some projects depending on different version of tensorflow and I do not want to mixed up different version of tensowflow.
So I just tried create a env called tf2.0 and used pip to install tensorflow 2.0.0b1 in that specific virtual environment.
However, after I ran 'pip install tensorflow-gpu==2.0.0b1` in that "tf2.0" conda environment, I found that it takes effect globally, which mean I have to use tensorflow-gpu 2.0.0b1 even when that virtual env "tf2.0" disactivated.
I wish I could use tensorflow 1.13 when virtual env is deactivated.
It's hard to troubleshoot the described conditions without more details (exact commands run, showing PATH before and after and post activation, etc.). Nevertheless, you can try switching to following the most recent recommendations for mixing Conda and Pip. Namely, avoid installing things ad hoc, which is prone to using the wrong pip and clobbering packages, but instead define a YAML file and always create the whole env in one go.
As a minimal example:
my_env.yaml
name: my_env
channels:
- defaults
dependencies:
- python
- pip
- pip:
- tensorflow-gpu==2.0.0b1
which can be created with conda env create -f my_env.yaml. Typically, it is best to include everything possible in the "non-pip" section of dependencies.
It is mostly that you used a wrong pip. To make sure you are using correct pip, it is usually a good practice to do
python -m pip install —user PACKAGE_NAME
Given that you have conda, pip should be the last resort.
Conda channel conda-forge most likely has the latest package version you are looking for.
conda install -c conda-forge PACKAGE_NAME
If you have to use pip, make sure you are in an environment and that environment has its own pip.
conda create -n test python=3.7
conda activate test
python -m pip install PACKAGE_NAME
From your described problem, I can guess that your environment is not activated in which you are trying to install the tensorflow2.0
Please make sure to activate the environment after making it.
so after creating the environment do this-
conda activate tf2.0
make sure you see this
(tf2.0) C:\Users\XYZ>
And then you install your tensorflow.
For the first time I'v installed tensorflow with conda installation. Then I actually work with a seq2seq model. After that I have again installed the tensorflow with the pip installation. But now the libraries are very different. All the old scripts are misplaced etc. Why is that ? Why I didn't face this when I was working with coda instillation
It has been claimed that Tensorflow installed with Conda performs a lot faster than a Pip installation, for example:
https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c
Conda also installs all of the package dependencies automatically, which Pip does not, as far as I'm aware.
https://www.anaconda.com/blog/developer-blog/tensorflow-in-anaconda/
Pip and conda install to two different locations. You should try to stick to one or the other. I would recommend uninstalling the conda version and sticking to pip but it's up to you how to proceed.
Update 01-02-2019: It seems that conda is now the faster and preferred way to install tensorflow. Note this may change again in the future.