I have kept a tensorflow model for run throughout night, but in the morning i see that the tensorflow code did not run even when the laptop is active.
can some one please help on how do i make the tensorflow code running when i am away laptop
Related
Can someone tell my why when I train my model using tensorflow-gpu in the jupyter notebook that my dedicated GPU memory is 85% in use even after the training model has completed so if I try to run the same model or a modified model I get the error Failed to get convolution algorithm. This is probably because cuDNN failed to initialize. If I want to run another model I need to quit the Anaconda Prompt and relaunch jupyter notebook for the memory to clear. Is this happening to anyone else? Does anyone know how to clear the GPU memory?
When training a model on the GPU machine, it get interrupted due to some system patch process. Since Google cloud GPU machines do not have an option of live migration, it is painful task to restart the training every time this happens. Google has clearly mentioned that there is no way around this but to restart the machines in this Doc.
Is there a clever way to detect if the machine is rebooted and resume the training automatically.
Sometimes it also happens that due to some kernel update, the CUDA drivers stop working and GPU is not visible and CUDA drivers need a re-installation. So writing a startup script to resume the training is also not a bulletproof solution.
Yes there is. If you use tensorflow, you can use it's checkpointing feature to save your progress and pick up where you left off.
One great example of this is provided here: https://github.com/GoogleCloudPlatform/ml-on-gcp/blob/master/gce/survival-training/README-tf-estimator.md
A screen shot of my problem
I have been trying to install Keras for about a week now. I installed Anaconda and then Tensorflow with Python3.5 and Jupyter. When I start up with the Anaconda3 prompt it always gives me the message
>was unexpected at this time
C:\Users\Ray Van>#IF NOT "==" #chcp > NUL
C:\Users\Ray Van>
I used to be able to just say
Jupyter Notebook but it doesn't like this
Also I want to say activate tensorflow and then say jupyter notebook and then run a Python program with Keras (for Neural networks) but no matter what I tried, nothing works. I read somewhere that having the blank in the name \Ray Van] can be a problem but I didn't set that up. Somehow it was just set up by Windows 10 and from reading various posts, it seem very difficult to change without risking having to install Windows10 again. Various places say that it is very easy to install Keras, but I have found the opposite after trying several days for 3 hours at a time. But I am not good at installing things like this and don't really understand how all the things are connected. Maybe I have to start over and install Anaconda and then tensorflow and then from within the tensorflow environment install Keras and Jupyter. I know the pip command or the conda command are used for this but I don't really understand that either. So a total newbie who just wants to run some Python programs for my Neural Network research using Keras.
I have previously asked if it is possible to run tensor flow with gpu support on a cpu. I was told that it is possible and the basic code to switch which device I want to use but not how to get the initial code working on a computer that doesn't have a gpu at all. For example I would like to train on a computer that has a NVidia gpu but program on a laptop that only has a cpu. How would I go about doing this? I have tried just writing the code as normal but it crashes before I can even switch which device I want to use. I am using Python on Linux.
This thread might be helpful: Tensorflow: ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
I've tried to import tensorflow with tensorflow-gpu loaded in the uni's HPC login node, which does not have GPUs. It works well. I don't have Nvidia GPU in my laptop, so I never go through the installation process. But I think the cause is it cannot find relevant libraries of CUDA, cuDNN.
But, why don't you just use cpu version? As #Finbarr Timbers mentioned, you still can run a model in a computer with GPU.
What errors are you getting? It is very possible to train on a GPU but develop on a CPU- many people do it, including myself. In fact, Tensorflow will automatically put your code on a GPU if possible.
If you add the following code to your model, you can see which devices are being used:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This should change when you run your model on a computer with a GPU.
I am using Windows 7. After i tested my GPU in tensorflow, which was awkwardly slowly on a already tested model on cpu, i switched to cpu with:
tf.device("/cpu:0")
I was assuming that i can switch back to gpu with:
tf.device("/gpu:0")
However i got the following error message from windows, when i try to rerun with this configuration:
The device "NVIDIA Quadro M2000M" is not exchange device and can not be removed.
With "nvida-smi" i looked for my GPU, but the system said the GPU is not there.
I restarted my laptop, tested if the GPU is there with "nvida-smi" and the GPU was recogniced.
I imported tensorflow again and started my model again, however the same error message pops up and my GPU vanished.
Is there something wrong with the configuration in one of the tensorflow configuration files? Or Keras files? What can i change to get this work again? Do you know why the GPU is so much slower that the 8 CPUs?
Solution: Reinstalling tensorflow-gpu worked for me.
However there is still the question why that happened and how i can switch between gpu and cpu? I dont want to use a second virtual enviroment.