How to install nvidia transfer learning toolkit in google colab? - google-colaboratory

I tried with solutions available online but none worked. Nvidia TLT is using docker image so is it possible in colab?
docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2

Unfortunately it is not possible as of now because Google Colab does not support Docker. I tried running this notebook to install Docker.
You can refer to this Installing Docker on Google Colab.

Related

Jupyter Notebook Kernel dies when importing tenserflow

I am using Macbook Air with M1 chip. When trying to import tensorflow in Jupyter notebook, the kernel dies and displays a prompt that "Kernel has died and will restart in sometime". Could someone help me fix this?
Tensorflow version - 2.5.0
Python version - 3.8.8
Try running the notebook file within VS Code, there are extensions to help with that. Also check this article on how to install tf on M1 https://towardsdatascience.com/installing-tensorflow-on-the-m1-mac-410bb36b776
It seems this is a recurring issue with multiple people with the m1 macs. Since it is still fairly new, it is possible that Jupiter notebook still doesn't fully support it. Try using anaconda navigator with the windows emulator. Here is a link to a forum post with people having the same problem.
https://github.com/apple/tensorflow_macos/issues/45
Anaconda and upgrading to new M1 Mac

Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU

I am using the pre-built deep learning VM instances offered by google cloud, with an Nvidia tesla K80 GPU attached. I choose to have Tensorflow 2.5 and CUDA 11.0 automatically installed. When I start the instance, everything works great - I can run:
Import tensorflow as tf
tf.config.list_physical_devices()
And my function returns the CPU, accelerated CPU, and the GPU. Similarly, if I run tf.test.is_gpu_available(), the function returns True.
However, if I log out, stop the instance, and then restart the instance, running the same exact code only sees the CPU and tf.test.is_gpu_available() results in False. I get an error that looks like the driver initialization is failing:
E tensorflow/stream_executor/cuda/cuda_driver.cc:355] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
Running nvidia-smi shows that the computer still sees the GPU, but my tensorflow can’t see it.
Does anyone know what could be causing this? I don’t want to have to reinstall everything when I’m restarting the instance.
Some people (sadly not me) are able to resolve this by setting the following at the beginning of their script/main:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I had to reinstall CUDA drivers and from then on it worked even after restarting the instance. You can configure your system settings on NVIDIAs website and it will provide you the commands you need to follow to install cuda. It also asks you if you want to uninstall the previous cuda version (yes!).This is luckily also very fast.
I fixed the same issue with the commands below, taken from https://issuetracker.google.com/issues/191612865?pli=1
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
Option-1:
Upgrade a Notebooks instance's environment. Refer the link to upgrade.
Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Option-2:
Connect to the notebook VM via SSH and run the commands link.
After execution of the commands, the cuda version will update to 11.3 and the nvidia driver version to 465.19.01.
Restart the notebook VM.
Note: Issue has been solved in gpu images. New notebooks will be created with image version M74. About new image version is not yet updated in google-public-issue-tracker but you can find the new image version M74 in console.

Tensorflow and Cuda not compatible

I'm trying to run a GitHub code, using TensorFlow 1.x.
I'm using colab for this. I'm encountering this kind of problem which I can't find a solution to.
I'm using Tensorflow 1.15, my Cuda version installed is 10.1 and the Nvidia drive version in colab is NVIDIA-SMI 450.51.05 Driver Version: 418.67.
When I tried to run another code above , cuda seems to be functional.
I'm using the GPU mode on colab
Can someone help me, please?
Thanks.
maybe the notebook session didn't connect the gpu, try restarting the session and wait for notebook to allocate the required resources.
Please share the link of Colab as I think you are not configuring the colab to use the GPUs. You can also follow the steps as,
Go to Colab
In the edit option at upper left corner
In the Hardware accelerator dropdown menu, select GPU.

How to download the cuDNN straight from nvidia website to my linux instance on GCP

I want to install tensorflow-gpu on my linux machine on google cloud platform. I am not using an deep learning vm gcp provide. So I installed anaconda on my linux instance and now i want to install tensorflow. I already installed nvidia drivers and cuda. They can be downloaded straight in to the cloud instances. But for cuDNN I have to download it into my local machine and then upoad it into the cloud instance. Is there a way to download that file directly from nvidia site to my cloud instance? Thank you
EDIT
CUDNN_URL="developer.download.nvidia.com/compute/redist/cudnn/v5.1/cudnn-8.0-linux-x64-v5.1.tgz"
wget -c ${CUDNN_URL}
Using these lines of commands we can directly download cudnnv5.1 and I have seen the links for version 6.5 as well. I tried the same link by putting the version I want but it did not work. Anyone knows a way to use this CUDNN_URL to directly download cudnn v7.1 or higher directly using wget or curl but not logging into the an Nvidia account?
There was a change in the naming convention of cuDNN archives.
Since version 7.2.1, NVIDIA added the full version number into the archive name instead of the previously used short one.
That means that the resulting download link for 7.2.1 is:
https://developer.download.nvidia.com/compute/redist/cudnn/v7.2.1/cudnn-9.2-linux-x64-v7.2.1.38.tgz
instead of,
https://developer.download.nvidia.com/compute/redist/cudnn/v7.2.1/cudnn-9.2-linux-x64-v7.2.tgz
You can follow this pattern:
VERSION_FULL="8.1.0.77"
VERSION="${VERSION_FULL%.*}"
CUDA_VERSION="11.2"
OS_ARCH="linux-x64"
CUDNN_URL="https://developer.download.nvidia.com/compute/redist/cudnn/v${VERSION}/cudnn-${CUDA_VERSION}-${OS_ARCH}-v${VERSION_FULL}.tgz"
wget -c ${CUDNN_URL}
The resulting link would be:
https://developer.download.nvidia.com/compute/redist/cudnn/v8.1.0/cudnn-10.2-linux-x64-v8.1.0.77.tgz
Because you need to have a developer account to get cuDNN there are no direct links to download files.
As a workaround you can download cuDNN and other software to your local machine and then follow documentation Transferring files to instances to copy files to your VM instance:
For example, if you use Windows I'd recommend you to use WinSCP to copy files to your VM.
In addition, have a look at this article Deep Learning environment setup on Ubuntu(16.04) | CUDA, cuDNN, OpenCV(3.x), TensorFlow, Keras.
If your really concerned about(I was) data to download cuda and cudnn files to your local machine and then upload it to the gcp instance. You can set up an GUI for your GCP instance in no time. check this https://www.youtube.com/watch?v=e3RnnmcNI_E or any vnc server tutorial. After that you can directly download any file from using a web browser.

Tensorflow 2.0 beta GPU running in jupyter notebook, but not in google colab

I am working with tensorflow 2.0 beta, and while i managed to get my GPU working on anaconda through a few youtube tutorials I am unable to get my gpu running in google colab. I know google has the option to enable a gpu from one of their servers but My GTX 1070 is much faster, and i need to run off colab and not just Jupyter exclusively.
So I read the documentation like a good boy and the only thing i think i could have done wrong is my path settings I have screenshots bellow.
I followed several different youtube tutorials faithfully until the final one here gave me a way to install it to jupyter. Which is great, but I also need it to run on google colab as well.
I've been trying this since Friday and it's now tuesday and I'm losing my mind over this. Help me stackoverflow, you're my only hope.
https://imgur.com/a/8WibGWT
If you can get it running on your own Jupyter server then you can point colab to that local server.
Full instructions here: https://research.google.com/colaboratory/local-runtimes.html but edited highlights are:
install jupyter_http_over_ws:
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
start your local server allowing colab domain:
jupyter notebook \
--NotebookApp.allow_origin='https://colab.research.google.com' \
--port=8888 \
--NotebookApp.port_retries=0
Click 'connect to local runtime' in colab