I am trying to install cuDF on Google Colab for hours. One of the requirements I should install cuDF with GPU Tesla T4. While google colab gives me every time GPU Tesla K80 and I cannot install cuDF. I tried this snippet of code to check what type of GPU I have every time:
import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)
if device_name != b'Tesla T4':
raise Exception("""
Unfortunately this instance does not have a T4 GPU.
Please make sure you've configured Colab to request a GPU instance type.
Sometimes Colab allocates a Tesla K80 instead of a T4. Resetting the instance.
If you get a K80 GPU, try Runtime -> Reset all runtimes...
""")
else:
print('Woo! You got the right kind of GPU!')
It is too frustrating to get specific type of GPU by google colab because it is kind of a luck. I am asking here to see if someone experienced the same issue, and how was it solved?
The K80 use Kepler GPU architecture, which is not supported by RAPIDS. Colab itself no longer can run the latest versions of RAPIDS. You can try SageMaker Studio Lab for your Try it Now experience. https://github.com/rapidsai-community/rapids-smsl.
Related
I'm using Anaconda prompt to install:
Tensorflow 2.10.0
cudatoolkit 11.3.1
cudnn 8.2.1
I'm using Windows 11 and a RTX 3070 Nvidia graphic card. And all the drives have been updated.
And I tried downloading another version of CUDA and CUdnn in exe.file directly from CUDA website. And
added the directories into system path.The folder looks like this:
But whenever I type in:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
It gives me Num GPUs Available: 0
And surely it eats my CPU for computing everything.
It once succeeded when I used Colab and use GPU as the accelerator then created a session using my GPU. That time the GPU load has been maximized. But later don't know how, I can't use my own GPU for training in Colab or even their default free GPU.
Please help. ChatGPT doesn't give me correct information since it only referred to knowledge before 2020. It keeps asking me to install 'tensorflow-gpu' which has already been removed.
I tried to use an NVIDIA GeForce RTX 1080Ti GPU PC Card to work with TFJS and followed the hardware and software requirements as stated in the documentation, but I could not see a drastic difference in performance yet. Seems like it's ignoring the GPU.
I am unsure if I’m following the correct guidelines as the above documentation seems like it’s for Tensorflow Python.
Do I need to do some more settings for using the GPU version of TensorFlow.js node?
Difference in performance is massive in anything but trivial tasks (where CPU is faster simply because it takes time to get ops and data to&from GPU)
Like you said, its probably not using GPU to start with - which is most commonly due to cuda issues (its non-trivial to get cuda installation & versions right).
Increase tensorflow logging info: export TF_CPP_MIN_LOG_LEVEL=0 before running node app, you should see something like this in the output when its using GPU:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9575 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
If you don't see that, go back to docs and make sure cuda is installed correctly.
I'm running the UNet Keras model on a GCP instance with one NVIDIA Tesla P4GPU. But it does not detect the GPU. Instead it runs on the CPU. p.s. I installed drivers & tensorflow-gpu buy it wont work. How to fix this issue?
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-1): /proc/driver/nvidia/version does not exist
Num GPUs Available: 0
You need to first install the driver. Follow this instruction
I am training a CNN on GCP's notebook using a Tesla V100. I've trained a simple yolo on my own custom data and it was pretty fast but not very accurate. So, I decided to write my own code from scratch to solve the specific aspects of the problem that I want to tackle.
I have tried to run my code on Google Colab prior to GCP, and it went well. Tensorflow detects the GPU and is able to use it whether it was a Tesla K80 or T4.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
tf.test.is_gpu_available() #>>> True
My problem is that, this same function returns a False on GCP notebook, as if Tensorflow is unable to use the GPU it detected on GCP VM. I don't know of any command that forces Tensorflow to use the GPU over CPU, since it does that automatically.
I have already tried to install or uninstall and then install some versions of tensorflow, tensorflow-gpu and tf-nightly-gpu (1.13 and 2.0dev for instance) but it yielded nothing.
output of nvidia-smi
Have you tried using GCP's AI Platform Notebooks instead? They offer VMs that are pre-configured with Tensorflow and have all required GPU drivers installed.
I am using Windows 7. After i tested my GPU in tensorflow, which was awkwardly slowly on a already tested model on cpu, i switched to cpu with:
tf.device("/cpu:0")
I was assuming that i can switch back to gpu with:
tf.device("/gpu:0")
However i got the following error message from windows, when i try to rerun with this configuration:
The device "NVIDIA Quadro M2000M" is not exchange device and can not be removed.
With "nvida-smi" i looked for my GPU, but the system said the GPU is not there.
I restarted my laptop, tested if the GPU is there with "nvida-smi" and the GPU was recogniced.
I imported tensorflow again and started my model again, however the same error message pops up and my GPU vanished.
Is there something wrong with the configuration in one of the tensorflow configuration files? Or Keras files? What can i change to get this work again? Do you know why the GPU is so much slower that the 8 CPUs?
Solution: Reinstalling tensorflow-gpu worked for me.
However there is still the question why that happened and how i can switch between gpu and cpu? I dont want to use a second virtual enviroment.