Dedicated GPU memory doesn't clear - tensorflow

Can someone tell my why when I train my model using tensorflow-gpu in the jupyter notebook that my dedicated GPU memory is 85% in use even after the training model has completed so if I try to run the same model or a modified model I get the error Failed to get convolution algorithm. This is probably because cuDNN failed to initialize. If I want to run another model I need to quit the Anaconda Prompt and relaunch jupyter notebook for the memory to clear. Is this happening to anyone else? Does anyone know how to clear the GPU memory?

Related

How to check if a dlib model can be run on GPU only?

I have been searching for this on the internet but got confused. Some said certain dlib models are not possible to run in GPU and some said dlib models can be configured to run on GPU only by using the right CUDA settings.
May I ask if I am given a dlib model, how to check if a dlib model can be run on GPU only? Rather than testing it out through actually configuring it with CUDA? If no, can you explain why?
By the way the model I am trying to understand its GPU supports is shape_predictor_68_face_landmarks.dat from https://pyimagesearch.com/2018/04/02/faster-facial-landmark-detector-with-dlib/

TensorFlow mandatory GPU use

Our GPUs are in exclusive mode. Sometimes some user may manually login a machine and steals a GPU.
How can I raise an exception whenever GPU initialization fails in a TensorFlow script? I noticed that when TensorFlow is unable to initialize the GPU, it prints out an error message but runs on CPU. I want to stop it instead of running on CPU.
If you force any part of your graph to run on a GPU using:
with tf.device('/device:GPU:0'):
then your session variables initializer will stop running and throw an InvalidArgumentError when no GPU is available.

How to develop for tensor flow with gpu without a gpu

I have previously asked if it is possible to run tensor flow with gpu support on a cpu. I was told that it is possible and the basic code to switch which device I want to use but not how to get the initial code working on a computer that doesn't have a gpu at all. For example I would like to train on a computer that has a NVidia gpu but program on a laptop that only has a cpu. How would I go about doing this? I have tried just writing the code as normal but it crashes before I can even switch which device I want to use. I am using Python on Linux.
This thread might be helpful: Tensorflow: ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
I've tried to import tensorflow with tensorflow-gpu loaded in the uni's HPC login node, which does not have GPUs. It works well. I don't have Nvidia GPU in my laptop, so I never go through the installation process. But I think the cause is it cannot find relevant libraries of CUDA, cuDNN.
But, why don't you just use cpu version? As #Finbarr Timbers mentioned, you still can run a model in a computer with GPU.
What errors are you getting? It is very possible to train on a GPU but develop on a CPU- many people do it, including myself. In fact, Tensorflow will automatically put your code on a GPU if possible.
If you add the following code to your model, you can see which devices are being used:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This should change when you run your model on a computer with a GPU.

After switching from gpu to cpu in tensorflow with tf.device("/cpu:0") GPU undocks everytime tf is imported

I am using Windows 7. After i tested my GPU in tensorflow, which was awkwardly slowly on a already tested model on cpu, i switched to cpu with:
tf.device("/cpu:0")
I was assuming that i can switch back to gpu with:
tf.device("/gpu:0")
However i got the following error message from windows, when i try to rerun with this configuration:
The device "NVIDIA Quadro M2000M" is not exchange device and can not be removed.
With "nvida-smi" i looked for my GPU, but the system said the GPU is not there.
I restarted my laptop, tested if the GPU is there with "nvida-smi" and the GPU was recogniced.
I imported tensorflow again and started my model again, however the same error message pops up and my GPU vanished.
Is there something wrong with the configuration in one of the tensorflow configuration files? Or Keras files? What can i change to get this work again? Do you know why the GPU is so much slower that the 8 CPUs?
Solution: Reinstalling tensorflow-gpu worked for me.
However there is still the question why that happened and how i can switch between gpu and cpu? I dont want to use a second virtual enviroment.

Tensorflow doesn't see GPU and uses CPU instead, how come?

I'm running a Python script using GPU-enabled Tensorflow. However, the program doesn't seem to recognize any GPU and starts using CPU straight away. What could be the cause of this?
Just want to add to the discussion that tensorflow may stop seeing a GPU due to CUDA initialization failure, in other words, tensorflow detects the GPU, but can't dispatch any op onto it, so it falls back to CPU. In this case, you should see an error in the log, like this
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUDA_ERROR_UNKNOWN
The cause is likely to be the conflict between different processes using GPU simultaneously. When that is the case, the most reliable way I found to get tensorflow working is to restart the machine. In the worst case, reinstalling tensorflow and / or NVidia driver.
See also one more case when GPU suddenly stops working.