A python program (PyTorch, via Spyder, running two GPUs) was running smoothly and using the two GPUs. Liunx-Ubunto-16.04 went suspended, due to locking the screen; then after log-ing, Spyder detected 0 GPUs. The GPUs, nonetheless, were available at the terminal ($ nvidia-smi). I had to restart the computer to get the GPUs back into Spyder.
Maybe there is a better way to initialize the GPUs, aside from managing the power settings, as the GPUs might be toggled off for other reasons.
import torch
torch.cuda.is_available()
Out[4]: False
Related
Is there a way to run RAPIDS without a GPU? I usually develop on a small local machine without a GPU, then push my code to a powerful remote server for real use. Things like TensorFlow allow switching between the CPU and GPU depending on if they're available. Can an equivalent thing be done with RAPIDS? Even if it's slow, being able to test things on a machine without a GPU would be extremely helpful.
There isn't a way to use RAPIDS without a GPU, and part of the reason for that is we're following the APIs the community has adopted in CPU packages across Pandas, Numpy, SKLearn, NetworkX, etc. This way it should be as easy as swapping an import statement to get something working on the CPU vs the GPU.
Is it possible to run Tensorboard on a machine without CUDA support?
I'm working at a computation center (via ssh) which has two major clusters:
CPU-Cluster which is a general workhorse without CUDA support (no dedicated GPU)
GPU-Cluster with dedicated GPUs e.g. for running neural networks with tensorflow-gpu.
The access to the GPU-cluster is limited to Training etc. such that I can't afford to run Tensorboard on a machine with CUDA-support. Instead, I'd like to run Tensorboard on the CPU-Cluster.
With the TF bundled Tensorboard I get import errors due to missing CUDA support.
It seems reasonable that the official Tensorboard should have a mode for running with CPU-only. Is this true?
I've also found an inofficial standalone Tensorboard version (github.com/dmlc/tensorboard), does this work without CUDA-support?
Solved my problem: just install tensorflow instead of tensorflow-gpu.
Didn't work for me for a while due to my virtual environment (conda), which didn't properly remove tensorflow-gpu.
Tensorboard is not limited by whether a machine has GPU or not.
And as far as I know, what Tensorboard do is parsing events pb files and display them on web. There is not computing, so it doesn't need GPU.
I have previously asked if it is possible to run tensor flow with gpu support on a cpu. I was told that it is possible and the basic code to switch which device I want to use but not how to get the initial code working on a computer that doesn't have a gpu at all. For example I would like to train on a computer that has a NVidia gpu but program on a laptop that only has a cpu. How would I go about doing this? I have tried just writing the code as normal but it crashes before I can even switch which device I want to use. I am using Python on Linux.
This thread might be helpful: Tensorflow: ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
I've tried to import tensorflow with tensorflow-gpu loaded in the uni's HPC login node, which does not have GPUs. It works well. I don't have Nvidia GPU in my laptop, so I never go through the installation process. But I think the cause is it cannot find relevant libraries of CUDA, cuDNN.
But, why don't you just use cpu version? As #Finbarr Timbers mentioned, you still can run a model in a computer with GPU.
What errors are you getting? It is very possible to train on a GPU but develop on a CPU- many people do it, including myself. In fact, Tensorflow will automatically put your code on a GPU if possible.
If you add the following code to your model, you can see which devices are being used:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This should change when you run your model on a computer with a GPU.
I am using Windows 7. After i tested my GPU in tensorflow, which was awkwardly slowly on a already tested model on cpu, i switched to cpu with:
tf.device("/cpu:0")
I was assuming that i can switch back to gpu with:
tf.device("/gpu:0")
However i got the following error message from windows, when i try to rerun with this configuration:
The device "NVIDIA Quadro M2000M" is not exchange device and can not be removed.
With "nvida-smi" i looked for my GPU, but the system said the GPU is not there.
I restarted my laptop, tested if the GPU is there with "nvida-smi" and the GPU was recogniced.
I imported tensorflow again and started my model again, however the same error message pops up and my GPU vanished.
Is there something wrong with the configuration in one of the tensorflow configuration files? Or Keras files? What can i change to get this work again? Do you know why the GPU is so much slower that the 8 CPUs?
Solution: Reinstalling tensorflow-gpu worked for me.
However there is still the question why that happened and how i can switch between gpu and cpu? I dont want to use a second virtual enviroment.
I'm running a Python script using GPU-enabled Tensorflow. However, the program doesn't seem to recognize any GPU and starts using CPU straight away. What could be the cause of this?
Just want to add to the discussion that tensorflow may stop seeing a GPU due to CUDA initialization failure, in other words, tensorflow detects the GPU, but can't dispatch any op onto it, so it falls back to CPU. In this case, you should see an error in the log, like this
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUDA_ERROR_UNKNOWN
The cause is likely to be the conflict between different processes using GPU simultaneously. When that is the case, the most reliable way I found to get tensorflow working is to restart the machine. In the worst case, reinstalling tensorflow and / or NVidia driver.
See also one more case when GPU suddenly stops working.