I am working on a project that involves training a deep learning model for a very long time (estimated to be around 30 hours on my PC). I am running this in jupyter notebook on my windows 10 PC with NVIDIA GTX 1050Ti GPU.
The problem is after running for 24 hours the kernel is automatically becoming idle. I have also tried running this in jupyter lab but the result is same. Even here, the kernel is going to idle mode just after running for 24 hours.
I have also checked the GPU usage in the task manager to see if the training is running in the background. But no the gpu that is being used is just 1 or 2% which was around 50% while the training was going on.
So my question is.. is it possible to run jupyter notebook for that long hours? If yes, how to do that? Is there any default setting that needs to be changed to run notebook for more than 24 hours?
Related
I know there is a limit on how much GPU you can use on Google Colab but what if you are just running a regular CPU script. Is there a limit to how long I can run it for?
I found this question but it is unclear whether it's talking about with GPU or without GPU.
From their docs
Notebooks run by connecting to virtual machines that have maximum
lifetimes that can be as much as 12 hours. Notebooks will also
disconnect from VMs when left idle for too long. Maximum VM lifetime
and idle timeout behavior may vary over time, or based on your usage.
If your notebook is not idle: 12 hours. If it is: 90 minutes after it becomes idle. This applies to using GPU or CPU.
I can not connect to a GPU on my Colab Pro+ for more than 5 days. I suppose this must be a bug.
I understand that GPUs are allocated dynamically, but if someone pays Pro+ it would be essential to get a smaller GPU at least. I can not even develop anything atm, because I can't run specific code without a GPU.
10 days of this month I was not using Colab (holidays), then some days of use and now 5 days with the error message "Cannot connect to GPU backend".
Trying to contact the support is a nightmare: I only get automated answers for 5 days in a row. It's an interesting platform but if there is a problem you are screwed and your 50$/month are just wasted.
I have a neural net that takes about 7-15 days to train on several GPU's.
Google colab disconnects after running the script for a few hours. There are a couple "hacks" that you can do to keep the session alive but this is obviously not the official standard.
Once I have written my script in google colab, how should I go about running the script for a long period of time? I'm connected and using google's deep learning VM's.
There is currently no way of running scripts for such long times (i.e. days) in the free version of Colab; in fact, it is clear from the Resource Limits section of the official FAQ that the maximum running time is 12 hours (emphasis added):
How long can notebooks run in Colab?
Notebooks run by connecting to virtual machines that have maximum
lifetimes that can be as much as 12 hours. Notebooks will also
disconnect from VMs when left idle for too long. Maximum VM lifetime
and idle timeout behavior may vary over time, or based on your usage.
This is necessary for Colab to be able to offer computational
resources for free. Users interested in longer VM lifetimes and more
lenient idle timeout behaviors that don’t vary as much over time may
be interested in Colab Pro.
So, if you really need running times in the order of days, you should consider Colab Pro.
I recently upgraded to colab pro. I am trying to use GPU resources from colab pro to train my Mask RCNN model. I was allocated around 15 GB of memory when I tried to run the model right after I signed up for Pro. However, for some reason, I was allocated just 1 GB of memory from the next morning. Since then, I haven't been allocated more than 1 GB. I was wondering if I am missing something or I perturbed the VM inherent packages. I understand that the allocation varies from day to day, but it's been like this for almost 3 days now. Following attempts have already made to improve, but none seems to work.
I have made sure that GPU and "High-RAM" option is selected.
I have tried restarting runtimes several times
I have tried running other scripts (just to make sure that problem was not with mask rcnn script)
I would appreciate any suggestions on this issue.
GPU info
The high memory setting in the screen controls the system RAM rather than GPU memory.
The command !nvidia-smi will show GPU memory. For example:
The highlighted output shows the GPU memory utilization: 0 of 16 GB.
I've read frequently (here, here and at tons of other places) that the VMs at google colab timeout after 12h. However, it's always about TPU and GPU accelerated VMs.
What about not-hardware-accelerated ones? Is there a time limit? Is it also 12 hours?
Quoted directly from the Colaboratory FAQ:
Notebooks run by connecting to virtual machines that have maximum lifetimes that can be as much as 12 hours. Notebooks will also disconnect from VMs when left idle for too long. Maximum VM lifetime and idle timeout behavior may vary over time, or based on your usage.
In short, yes. Non-accelerated runtimes also time out after 12 hours.