How to use a remote machine's GPU in jupyter notebook - tensorflow

I am trying to run tensorflow on a remote machine's GPU through Jupyter notebook. However, if I print the available devices using tf, I only get CPUs. I have never used a GPU before and am relatively new at using conda / jupyter notebook remotely as well, so I am not sure how to set up using the GPU in jupyter notebook.
I am using an environment set up by someone else who already executed the same code on the same GPU, but they did it via python script, not in a jupyter notebook.
this is the only code in the other person's file that had to do with the GPU
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
set_session(tf.Session(config=config))

I think the problem was that I had tensorflow in my environment instead of tensorflow-gpu. But now I get this message "cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version" and I don't know how to update the driver through terminal

How is your environment set up? Specifically, what is your remote environment, and what is your local environment? Sounds like your CUDA drivers are out of date, but it could be more than just that. If you are just getting started, I would recommend finding an environment that requires little to no configuration work on your part, so you can get started more easily/quickly.
For example, you can run GPUs on the cloud, and connect to them via local terminal. You also have your "local" frontend be Colab by connecting it to a local runtime. (This video explains that particular setup, but there's lots of other options)
You may also want to try running nvidia-smi on the remote machine to see if the GPUs are visible.

Here is another solution, that describes how to set up a GPU-Jupyterlab instance with Docker.
To update your drivers via terminal, run:
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot
Are your CUDA paths set appropriately? Like that?
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Related

SSH Jupyter notebook using non-base Conda envirnoment?

My problem is the following: I want to run a Jupyter notebook on my remote desktop and access it via my laptop elsewhere. I have accomplished this, but I can't use my GPU for tensorflow because the GPU-supported version is only installed in my custom, non-base environment. Even though all of my installed jupyter kernels are available, it seems things don't work right unless I run 'jupyter notebook' from within the correct activated conda environment (says "no GPU" even though I select as the kernel the one where tensorflow-gpu is installed).
Is there a simple way of running jupyter notebook from within that environment by a batch script? I also need it to run the notebook on a secondary drive.
I could of course just start up the server while at home and then access it using the token, but that's a little clumsy.
I've found a solution. On windows, in %AppData%\Roaming\Microsoft\Windows\Start Menu\Programs\Anaconda3, there are shortcuts for various Anaconda-related programs, including Jupyter notebook for each environment.
The shortcut for Jupyter notebook for my given env is
`E:\Software\Anaconda3\python.exe E:\Software\Anaconda3\cwp.py E:\Software\Anaconda3\envs\tf E:\Software\Anaconda3\envs\tf\python.exe E:\Software\Anaconda3\envs\tf\Scripts\jupyter-notebook-script.py "%USERPROFILE%".
I modified this to end in '"E:" --no-browser' instead of the userprofile bit and made that into a script. Now when I SSH into the computer and run this script, the notebook is within the correct environment and I have access to my GPU, all on the correct drive, E.

Google Cloud Deep Learning On Linux VM throws Unknown Cuda Error

I am trying to set up a deep learning VM on Google Cloud but I keep running into the same issue over and over again.
I will follow all the steps, set up a N1-highmem-8 (8 vCPU, 52gb Memory) instance, add a single T4 GPU and select the Deep Learning Image: TensorFlow 2.4 m69 CUDA 110 image. That's it.
After that, I will ssh into the vm, run the script that installs all the NVIDIA drivers and... when I begin using it, by simply running
from tensorflow.keras.layers import Input, Dense
i = Input((100,))
x = Dense(500)(i)
I keep getting failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error. By that point I haven't installed anything and haven't done anything custom, just the vanilla image from GCP.
What is more concerning is that, even if I delete the vm and then create a new one with the same config, some times the error won't happen immediately and sometimes it's present off the bat.
Has anyone encountered this? I've googled around to see if anyone has faced this issue and while I came across suggestions, all of them are old and have not worked for me. More over, suggestions on NVIDIA support forums tell me to re-install everything and the whole point of me using a pre-built GCP image specifically for deep learning is so that I don't have to enter the hell of installing and resolving issues with NVIDIA drivers.
The issue is fixed with the M74 image, but you are using M69. So follow one of the two fixes provided in the Google Cloud public forum.
we can mitigate the issue by:
Fix #1: Use the latest DLVM image (M74 or later) in a new VM instance: They have released a fix for the newest DLVM image in M74 so you will no longer be affected by this issue.
Fix #2: Patch your existing instance running images older than M74.
Run the following via an SSH session on the affected instance:
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
This only needs to be done once, and does not need to be rerun each time the instance is rebooted.

How can I run ImageAI on my GPU and not my CPU

So I am quite new to this I was trying to find answers on google but it is kind of not working. So I am trying to run this library ImageAI library
I am able to run it normally on the CPU at least I think it runs on CPU by just calling python test.py. Am I correct there
But since the model prediction takes a long time I would like to run it on my GPU. What I was trying to do is create a conda environment and activate it but after I do I get this error.
ModuleNotFoundError: No module named 'imageai.Classification'
Although I have imageai installed in my environment
pip freeze | findstr imageai
imageai==2.1.5
As you can see by executing this command. What am I doing wrong here?
I found the solution it doesnt require the conda environment. ImageAI automatically runs on GPU if available. All you need to do is to ensure you have the GPU version of Tensorflow installed.

Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU

I am using the pre-built deep learning VM instances offered by google cloud, with an Nvidia tesla K80 GPU attached. I choose to have Tensorflow 2.5 and CUDA 11.0 automatically installed. When I start the instance, everything works great - I can run:
Import tensorflow as tf
tf.config.list_physical_devices()
And my function returns the CPU, accelerated CPU, and the GPU. Similarly, if I run tf.test.is_gpu_available(), the function returns True.
However, if I log out, stop the instance, and then restart the instance, running the same exact code only sees the CPU and tf.test.is_gpu_available() results in False. I get an error that looks like the driver initialization is failing:
E tensorflow/stream_executor/cuda/cuda_driver.cc:355] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
Running nvidia-smi shows that the computer still sees the GPU, but my tensorflow can’t see it.
Does anyone know what could be causing this? I don’t want to have to reinstall everything when I’m restarting the instance.
Some people (sadly not me) are able to resolve this by setting the following at the beginning of their script/main:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I had to reinstall CUDA drivers and from then on it worked even after restarting the instance. You can configure your system settings on NVIDIAs website and it will provide you the commands you need to follow to install cuda. It also asks you if you want to uninstall the previous cuda version (yes!).This is luckily also very fast.
I fixed the same issue with the commands below, taken from https://issuetracker.google.com/issues/191612865?pli=1
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
Option-1:
Upgrade a Notebooks instance's environment. Refer the link to upgrade.
Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Option-2:
Connect to the notebook VM via SSH and run the commands link.
After execution of the commands, the cuda version will update to 11.3 and the nvidia driver version to 465.19.01.
Restart the notebook VM.
Note: Issue has been solved in gpu images. New notebooks will be created with image version M74. About new image version is not yet updated in google-public-issue-tracker but you can find the new image version M74 in console.

WSL2- $nvidia-smi command not running

I have Ubuntu 18.04LTS install inside WSL2 and I was able to use GPU. I can run
$nvidia-smi from window run terminal.
However, I can not find any result when I run $nvidia-smi on WSL2
The fix is now available at the nvidia-docs.
cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi
chmod ogu+x /usr/bin/nvidia-smi
Source: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations
From the known limitations in documentation from nvidia :
NVIDIA Management Library (NVML) APIs are not supported. Consequently,
nvidia-smi may not be functional in WSL 2.
However you should be able to run https://docs.nvidia.com/cuda/wsl-user-guide/index.html#unique_1238660826
EDIT: since this answer, nvidia-smi supported since driver 465.42
I am running it well using 470.57.02.
When I was installing CUDA 11.7.1 in WSL2, a same error raised which noted me
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
I fixed it by updating the Windows system to version 19044 21H2.