In the last week or two I have seen frequent disconnects while trying to run a lengthy training run. A month or two ago this seemed to be working pretty reliably. My code has definitely changed but those internal details seem unrelated to the operation of Colab.
(On the other hand, I did switch my local machine from an Intel MacBook Pro running Big Sur to an M1 (Apple Silicon) MacBook Pro running Monterey. I assume that does not matter to Colab running in the cloud, via a Chrome browser.)
I see two kinds of disconnects:
There are “faux disconnects” which seem like false positives from
the disconnect detector. These last less than a second, then the
computation continues apparently unscathed. A black notification
slides up from the lower left corner of then window, then slides
back. See a link to a video of this below.
Then there are “real disconnects.” I start a computation that I
expect to run for several hours. I see “faux disconnects” happen
frequently. But less than an hour into the computation, I find
the Colab window idle, no status information, and a Reconnect button
in the upper right corner.
Link to video. I started this session around 1:03 pm. This video was recorded at 1:35 pm. Normally the training session should have run for several hours. Instead it died at 1:52 pm (~50 minutes into the run). See some additional comments in an issue at GitHub.
Can anyone help me understand how to get past this? I am currently unable to make progress in my work because I cannot complete a training run before my Colab runtime decides to disconnect.
Edit:
FYI: since once a “real disconnect” happens it is too late to look at the (no longer connected) runtime's log, and since this seems to run for about an hour before disconnecting, I saved a log file when a run was about 10 minutes in.
Edit on August 1, 2022:
My real problem is the “real disconnect” on my real Colab notebook. But my notebook is overly complicated, so not a good test case. I tried to make a small test case, see Colab notebook: DisconnectTest.ipynb. It contains a generic NIST-based Keras/TensorFlow benchmark from the innertubes. I made a screen grab video of the first 2.5 minutes of a run. While this run completes OK — that is, there are no “real disconnects” — it had several “faux disconnects.” The first one is at 1:36. These seem fairly benign, but they do disrupt the Resources panel on the right. This makes it hard to know if the source of the “real disconnect” has anything to do with exhausting resources.
As I described in a parallel post on Colab's Issue #2965 on Github, this appears to be “some interaction between Colab and Chrome (and perhaps macOS Monterey (Version 12.5.1), and perhaps M1 Apple Silicon). Yet Colab seems to work fine on M1/Monterey/Safari.”
As described there, a trivial Colab example fails on Chrome browser but works fine on Safari.
Related
Over the past couple months I have been experiencing BSOD's (some for other reasons but now it's just this one exe) and occasional black screen's.
I have gone through what I believe to be every driver and updated them, ran the windows memory diagnostics tool and the driver verifier and ran 2.5 passes of Memtest and had no issues with any of those, reset my computer to factory default (which is why I don't have the minidumps that I had before), and looked up everything I could trying to troubleshoot this issue. I'll take any good advice I can get from this point on and will answer any questions I can. Here are the mini dumps I have so far:
https://drive.google.com/drive/folders/1VqgX2KXoM32E6K0jAng-WoHZXDI8SOv1?usp=sharing
I am running a deep learning training program on my colab notebook which will cost about 10hours. If i close my browser, will it be shutdown by google before it ends as expected? Or will the last output be saved coorectly in my Drive?
I suggest you to look here and here. Basically, the code should keep running, but after some time (around 90 minutes) of idle activity, the notebook should be cut off, so I assume that what you suggest is not viable. Maybe you could try to launch the script in the morning and interact with it every 20-30 minutes to prevent it going to idle. Also, consider using Google Colab pro (faster GPUs and longer runtimes, but never longer that 24 hours)
The simple answer to that question is a solid no. Your session will go ahead and continue executing or will stay idle, as stated in the #SilentCloud 's Answer above it will go for about
90 Minutes [With CPU]
30 Minutes [With GPU]
The reason I say 30 Minutes with GPU is that I have personally tested that and it appears to be this number, as do use on a rather regular basis.
You can make a simple bot on Your Machine using pyautogui in order to go ahead and do some random stuff if for some reason it makes more economical sense, or you are not interested in Google Colab Pro Subscription.
Run with Browser Closed
If you want a seamless experience with the browser window effectively closed and having access to GPU's that are much more better and faster, I would recommend the Colab Pro + Subscription.
But the Scripting Idea is there, and your mileage may vary.
I am trying to train a model for image recognition using Yolo version 3 with this notebook:
https://drive.google.com/file/d/1YnZLp6aIl-iSrL4tzVQgxJaE1N2_GfFH/view?usp=sharing
But for some reason, everything works fine but the final training. The training starts, and after 5-10 minutes (randomly) it stops working. The browser becomes unresponsive (I am unable to do anything inside that tab), and after several minutes Colab completely disconnects.
I have tried this 10 and more times and I always get the same result. I tried it on both Chrome Canary and regular Chrome (last versions), as well inside anonymous windows, but I always get the same result.
Any ideas? Why is that happening?
Eager to know your thoughts about this.
All the best,
Fab.
Problem solved. Tried the same process on Firefox and discovered that the auto-saving feature of Google drive was conflicting with the process! So... I had to simply use the "playground" of colab instead as explained here:
https://stackoverflow.com/questions/58207750/how-to-disable-autosave-in-google-colab#:~:text=1%20Answer&text=Open%20the%20notebook%20in%20playground,Save%20a%20copy%20in%20Drive.
No idea why Chrome didn't give me any feedback about that, but Firefox saved my day!
Following #fabrizio-ferrari answer, I disabled output saving and the problem persisted.
Runtime -> Change runtime type -> Omit code cell output when saving this notebook
I moved to firefox and the problem disappeared.
I am using some modelling software on Google Drive, importing the necessary python libraries for the work I'm doing and running the 3rd party software (model builder). There hasn't ever been an issue in the first two parts of this, only when running the software.
Sometimes, it crashes when I start the model builder and sometimes it crashes after some iterations (the modeling software is running anywhere from 30-1000 times depending on what type of model I'm building). So, sometimes it completes the job (as it lasts enough iterations) and other times it doesn't.
I've had the same model as this run without issue before, and I've only changed one parameter in the model, which I can't see being the cause of this error.
I understand that the error and some of the information here is quite vague, however any input on how to solve this issue would be greatly appreciated.
Update
The less populated my Google Drive is the more the code progresses, however I am only using 20% of my storage and still having these problems.
Recently google colab consumes too much of internet data . Approx 4GB in 6 hours of training for single notebook . What can be the issue ?
Yes I have the same issue. It normally works fine but, there is sudden spike in the internet data. Check this. In the process it wasted 700 Mb in just 20 minutes, and I have mobile internet, so this creates a problem sometimes. Didn't find the answer but it seems like there is some kind of synchronization going on between the browser and the colab platform.
One thing you could do is to open the notebook in Playground mode as shown in this link How to remove the autosave option in Colab. This only happens because of the fact that Colab is saving everytime and there is a constant spike in the network. It becomes difficult when you use only mobile data. So, it is a safe option to open the notebook in Playground mode, so that the synchronization doesn't continue as usual.