How to download the cuDNN straight from nvidia website to my linux instance on GCP - tensorflow

I want to install tensorflow-gpu on my linux machine on google cloud platform. I am not using an deep learning vm gcp provide. So I installed anaconda on my linux instance and now i want to install tensorflow. I already installed nvidia drivers and cuda. They can be downloaded straight in to the cloud instances. But for cuDNN I have to download it into my local machine and then upoad it into the cloud instance. Is there a way to download that file directly from nvidia site to my cloud instance? Thank you
EDIT
CUDNN_URL="developer.download.nvidia.com/compute/redist/cudnn/v5.1/cudnn-8.0-linux-x64-v5.1.tgz"
wget -c ${CUDNN_URL}
Using these lines of commands we can directly download cudnnv5.1 and I have seen the links for version 6.5 as well. I tried the same link by putting the version I want but it did not work. Anyone knows a way to use this CUDNN_URL to directly download cudnn v7.1 or higher directly using wget or curl but not logging into the an Nvidia account?

There was a change in the naming convention of cuDNN archives.
Since version 7.2.1, NVIDIA added the full version number into the archive name instead of the previously used short one.
That means that the resulting download link for 7.2.1 is:
https://developer.download.nvidia.com/compute/redist/cudnn/v7.2.1/cudnn-9.2-linux-x64-v7.2.1.38.tgz
instead of,
https://developer.download.nvidia.com/compute/redist/cudnn/v7.2.1/cudnn-9.2-linux-x64-v7.2.tgz
You can follow this pattern:
VERSION_FULL="8.1.0.77"
VERSION="${VERSION_FULL%.*}"
CUDA_VERSION="11.2"
OS_ARCH="linux-x64"
CUDNN_URL="https://developer.download.nvidia.com/compute/redist/cudnn/v${VERSION}/cudnn-${CUDA_VERSION}-${OS_ARCH}-v${VERSION_FULL}.tgz"
wget -c ${CUDNN_URL}
The resulting link would be:
https://developer.download.nvidia.com/compute/redist/cudnn/v8.1.0/cudnn-10.2-linux-x64-v8.1.0.77.tgz

Because you need to have a developer account to get cuDNN there are no direct links to download files.
As a workaround you can download cuDNN and other software to your local machine and then follow documentation Transferring files to instances to copy files to your VM instance:
For example, if you use Windows I'd recommend you to use WinSCP to copy files to your VM.
In addition, have a look at this article Deep Learning environment setup on Ubuntu(16.04) | CUDA, cuDNN, OpenCV(3.x), TensorFlow, Keras.

If your really concerned about(I was) data to download cuda and cudnn files to your local machine and then upload it to the gcp instance. You can set up an GUI for your GCP instance in no time. check this https://www.youtube.com/watch?v=e3RnnmcNI_E or any vnc server tutorial. After that you can directly download any file from using a web browser.

Related

Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU

I am using the pre-built deep learning VM instances offered by google cloud, with an Nvidia tesla K80 GPU attached. I choose to have Tensorflow 2.5 and CUDA 11.0 automatically installed. When I start the instance, everything works great - I can run:
Import tensorflow as tf
tf.config.list_physical_devices()
And my function returns the CPU, accelerated CPU, and the GPU. Similarly, if I run tf.test.is_gpu_available(), the function returns True.
However, if I log out, stop the instance, and then restart the instance, running the same exact code only sees the CPU and tf.test.is_gpu_available() results in False. I get an error that looks like the driver initialization is failing:
E tensorflow/stream_executor/cuda/cuda_driver.cc:355] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
Running nvidia-smi shows that the computer still sees the GPU, but my tensorflow can’t see it.
Does anyone know what could be causing this? I don’t want to have to reinstall everything when I’m restarting the instance.
Some people (sadly not me) are able to resolve this by setting the following at the beginning of their script/main:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I had to reinstall CUDA drivers and from then on it worked even after restarting the instance. You can configure your system settings on NVIDIAs website and it will provide you the commands you need to follow to install cuda. It also asks you if you want to uninstall the previous cuda version (yes!).This is luckily also very fast.
I fixed the same issue with the commands below, taken from https://issuetracker.google.com/issues/191612865?pli=1
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
Option-1:
Upgrade a Notebooks instance's environment. Refer the link to upgrade.
Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Option-2:
Connect to the notebook VM via SSH and run the commands link.
After execution of the commands, the cuda version will update to 11.3 and the nvidia driver version to 465.19.01.
Restart the notebook VM.
Note: Issue has been solved in gpu images. New notebooks will be created with image version M74. About new image version is not yet updated in google-public-issue-tracker but you can find the new image version M74 in console.

Nvidia flash.sh file not found in ubuntu 18

I want to create an image from nvidia jetson tx2, in several places (like https://developer.ridgerun.com/wiki/index.php?title=Cloning_TX2) talk about flash.sh file to perform the task but I cannot found it, also I search using find / -iname flash.sh and nothing. Where can I find this file? Do I need install something eles?
Ubuntu 18.04.5 LTS, Jetpack 4.5.1
Download L4T Driver Package (BSP) driver from: https://developer.nvidia.com/embedded/linux-tegra and download, be sure to dowload the correct version for your JetPach, for 4.5.1 check this:
https://developer.nvidia.com/embedded/l4t/r32_release_v5.1/r32_release_v5.1/t186/tegra186_linux_r32.5.1_aarch64.tbz2
Try to download this
https://gist.github.com/Davidnet/013ceb704ebdc7ebd728e059f90fca80
Put it in your path.
Run
./flash.sh

How to use a remote machine's GPU in jupyter notebook

I am trying to run tensorflow on a remote machine's GPU through Jupyter notebook. However, if I print the available devices using tf, I only get CPUs. I have never used a GPU before and am relatively new at using conda / jupyter notebook remotely as well, so I am not sure how to set up using the GPU in jupyter notebook.
I am using an environment set up by someone else who already executed the same code on the same GPU, but they did it via python script, not in a jupyter notebook.
this is the only code in the other person's file that had to do with the GPU
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
set_session(tf.Session(config=config))
I think the problem was that I had tensorflow in my environment instead of tensorflow-gpu. But now I get this message "cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version" and I don't know how to update the driver through terminal
How is your environment set up? Specifically, what is your remote environment, and what is your local environment? Sounds like your CUDA drivers are out of date, but it could be more than just that. If you are just getting started, I would recommend finding an environment that requires little to no configuration work on your part, so you can get started more easily/quickly.
For example, you can run GPUs on the cloud, and connect to them via local terminal. You also have your "local" frontend be Colab by connecting it to a local runtime. (This video explains that particular setup, but there's lots of other options)
You may also want to try running nvidia-smi on the remote machine to see if the GPUs are visible.
Here is another solution, that describes how to set up a GPU-Jupyterlab instance with Docker.
To update your drivers via terminal, run:
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot
Are your CUDA paths set appropriately? Like that?
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

tensorflow probability installation

I am trying to install "TensorFlow-Probability" for windows offline without going through internet so that I can avoid network firewall issue, but I could not find any instruction about how to. Any suggestion?
TensorFlow-Probability is pure python, there is no need for a windows build of it.
You should be able to download the file from: https://pypi.org/project/tensorflow-probability/#files
and pip install it in a conda or python environment.

Is it possible to compile tensorflow in Mac?

So I started to build tensorflow in Mac and the thing is that it doesn't seem possible to build tensorflow in Mac OS platform.
After following instructions in here, I get this package directory.
It seems like the build settings for bazel is only for linux distro. The reason why I thought so is because there is a .so file in package directory that is needed to be linked after importing tensorflow using python binary.
This is the result I get after importing tensorflow using python.
Is there any other way I can build tensorflow on Mac OS?
It seems like there are no options but to install tensorflow with pip. So I just created a new virtual machine and installed ubuntu 16.04 to use it as my docker host. By doing so, I can create a new docker container which can now link and execute the linux library.