Is there a way to run RAPIDS without a GPU? I usually develop on a small local machine without a GPU, then push my code to a powerful remote server for real use. Things like TensorFlow allow switching between the CPU and GPU depending on if they're available. Can an equivalent thing be done with RAPIDS? Even if it's slow, being able to test things on a machine without a GPU would be extremely helpful.
There isn't a way to use RAPIDS without a GPU, and part of the reason for that is we're following the APIs the community has adopted in CPU packages across Pandas, Numpy, SKLearn, NetworkX, etc. This way it should be as easy as swapping an import statement to get something working on the CPU vs the GPU.
Related
Every time I need to train a 'large' deep learning model I do it from Google Collab, as it allows you to use GPU acceleration.
My pc has a dedicated GPU, I was wondering if it is possible to use it to run my notebooks locally in a fast way. Is it possible to train models using my pc GPU? In that case, how?
I am open to work with DataSpell, VSCode or any other IDE.
Nicholas Renotte has a great 'Getting Started' video that goes through the entire process of setting up GPU accelerated notebooks on your PC. The stuff you're interested starts around the 12 minute mark.
Yes, it is possible to run .ipynb notebooks locally using GPU acceleration. To do so, you will need to install the necessary libraries and frameworks such as TensorFlow, PyTorch, or Keras. Depending on the IDE you choose, you will need to install the relevant plugins and packages for GPU acceleration.
In terms of IDEs, DataSpell, VSCode, PyCharm, and Jupyter Notebook are all suitable for running notebooks locally with GPU acceleration.
Once the necessary libraries and frameworks are installed, you will then need to install the appropriate drivers for your GPU and configure the environment for GPU acceleration.
Finally, you will need to modify the .ipynb notebook to enable GPU acceleration and specify the number of GPUs you will be using. Once all the necessary steps have been taken, you will then be able to run the notebook locally with GPU acceleration.
So I am using Google Colab because I have some functions I need to execute that take far too long on my cpu. I have set the runtime to the GPU accelrator, however when I run the cell, I still get this message: 'Warning: You are connected to a GPU runtime, but not utilizing the GPU'.
I understand that this means the code I am running is just using my cpu. However using my cpu, the function takes hours to execute. This is why I want to utilise Colab's GPU, however, even when I change runtime, it still uses my cpu... How do I specifically force Colab to utilise the GPU for executing a certain cell/function in Colab???
Edit: I have just found out apparently Colab uses GPU only when the package being used is a package specifically made for GPU usage. Is there some sort of external package I can use that forces a function to find a GPU to use before executing the function?
Edit: (The package I am using for the long calculation is Network X if that makes any difference)
Check out cuGraph, which lets you do the same graph calculations on the gpu as networkx. A medium post on compatibility between cuGraph and networkx graphs.
You only need to do a couple of things to get cuGraph working on Google Colab. As the Google Colab demo from this medium post suggests:
Use pynvml to confirm Colab allocated you a Tesla T4 GPU.
Install most recent Miniconda release compatible with Google Colab's Python install (3.6.7)
Install RAPIDS libraries
Copy RAPIDS .so files into current working directory, a workaround for conda/colab interactions
Update env variables so Python can find and use RAPIDS artifacts
!wget -nc https://github.com/rapidsai/notebooks-
extended/raw/master/utils/rapids-colab.sh
!bash rapids-colab.sh
import sys, os
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
And then you can do the same calculations on the gpu:
pagerank = cugraph.pagerank(G)
instead of
pagerank = nx.pagerank(G)
When training a model on the GPU machine, it get interrupted due to some system patch process. Since Google cloud GPU machines do not have an option of live migration, it is painful task to restart the training every time this happens. Google has clearly mentioned that there is no way around this but to restart the machines in this Doc.
Is there a clever way to detect if the machine is rebooted and resume the training automatically.
Sometimes it also happens that due to some kernel update, the CUDA drivers stop working and GPU is not visible and CUDA drivers need a re-installation. So writing a startup script to resume the training is also not a bulletproof solution.
Yes there is. If you use tensorflow, you can use it's checkpointing feature to save your progress and pick up where you left off.
One great example of this is provided here: https://github.com/GoogleCloudPlatform/ml-on-gcp/blob/master/gce/survival-training/README-tf-estimator.md
I have previously asked if it is possible to run tensor flow with gpu support on a cpu. I was told that it is possible and the basic code to switch which device I want to use but not how to get the initial code working on a computer that doesn't have a gpu at all. For example I would like to train on a computer that has a NVidia gpu but program on a laptop that only has a cpu. How would I go about doing this? I have tried just writing the code as normal but it crashes before I can even switch which device I want to use. I am using Python on Linux.
This thread might be helpful: Tensorflow: ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
I've tried to import tensorflow with tensorflow-gpu loaded in the uni's HPC login node, which does not have GPUs. It works well. I don't have Nvidia GPU in my laptop, so I never go through the installation process. But I think the cause is it cannot find relevant libraries of CUDA, cuDNN.
But, why don't you just use cpu version? As #Finbarr Timbers mentioned, you still can run a model in a computer with GPU.
What errors are you getting? It is very possible to train on a GPU but develop on a CPU- many people do it, including myself. In fact, Tensorflow will automatically put your code on a GPU if possible.
If you add the following code to your model, you can see which devices are being used:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This should change when you run your model on a computer with a GPU.
I have an application with AES-NI compiled it, but supposedly select the implementation at runtime based on cpuid. I want to test if it really functions correctly on an old CPU without such dedicated instructions. VirtualBox cannot help because the CPU is the same. How can I do such a test without having access to an old CPU?