Google Colab variable values lost after VM recycling - google-colaboratory

I am using a Google Colab Jupyter notebook for algorithm training and have been struggling with an annoying problem. Since Colab is running in a VM environment, all my variables become undefined if my session is idle for a few hours. I come back from lunch and the training dataframe that takes a while to load becomes undefined and I have to read_csv again to load my dataframes.
Does anyone know how to rectify this?

If the notebook is idle for some time, it might get recycled: "Virtual machines are recycled when idle for a while" (see colaboratory faq)
There is also an imposed hard limit for a virtual machine to run (up to about 12 hours !?).
What could also happen is that your notebook gets disconnected from the internet / google colab. This could be an issue with your network. Read more about this here or here
There are no ways to "rectify" this, but if you have processed some data you could add a step to save it to google drive before entering the idle state.

You can use local runtime with Google Colab. Doing so, the Colab notebook will use your own machine's resources, and you won't have any limits. More on this: https://research.google.com/colaboratory/local-runtimes.html
There are various ways to save your data in the process:
you can save on the Notebook's VM filesystem, e. g. pd.to_csv("my_data.csv")
you can import sqlite3 which is the Python implementation of the popular SQLite database. Difference between SQLite and other SQL databases is that the DBMS runs inside your application, and data is saved to the file system of that application. Info: https://docs.python.org/2/library/sqlite3.html
you can save to your google drive, download to your local file system through your browser, upload to GCP... more info here: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=eikfzi8ZT_rW

Related

Use Google Colab Resources on local IDE

I have a big doubt... is see a lot of blog posts where they say that you can use the Colab front-end to edit a local Jupiter Notebook
However I don't see the point... the actual advantage would be to use something like DataSpell or some local IDE, on a remote Notebook on Colab, and use the Colab Resources to do the computations, so you have:
IDE level of suggestions (Colab is pretty slow compared to local IDE)
cloud computing performances and advantages
Hoever, I don't see any blog talking about this... is there any way to do this?

Colab: How to disconnect from session without closing the tab?

Some background
My computer fan goes crazy when I am using Google Colab, it definitely uses local resources somehow. I am running very long processes (over 4 hours). Yesterday, it occurred to me that I was disconnected, I thought my session had crashed since I stoped receiving the status updates of my task's progress bar. But then after clicking on Connect to a hosted runtime I was able to reconnect to that session and just interact with it fine. Given that Google Colab uses some of my local resource, I looking for a way to put the client application on halt for a little bit.
Question
How to manually disconnect from my remote session without crashing/terminating it? Is that even possible?
Note:
There is an answer for Does Google Colab stay connected when I close my browser? that says
The current cell will continue executing once you close your browser, but the outputs will not end up in the notebook in Drive.
I would be fine if I am able to leave the session running remotely but not being able to access the outputs on the notebook, given that I save the result on google drive when the process is done. So, not been able to see the output on the notebook would not be an issue for me.

Does Google Colab uses local resources too? How can I stop that?

I noticed that whenever I open a Google Colab notebook my system fans go high and all of my 4 cores show huge usage (on my ubuntu laptop). Clearly a lot of JS is running on my system.
However, when I host a Jupiter notebook on another machine and use that from my laptop, all the resource usage is normal.
Q: Is there a way to make Google Colab use minimal resources of my PC?
While google colab is an awesome way to share my code (and ask questions), the sound from fan speed annoys me a lot.
p.s; If this is not th right plac to ask this, kindly let me know where can I ask it?
Check if your Google Colab is running in local runtime. By default, it runs on its own Compute Engine, but you do have the option to alter it.
P.S It could also be Google Chrome simply using too many resources when running Colab. Try Edge or other lesser power-hungry browsers.

Is there a way to connect google colab to my google drive for good?

I found this great question here:
https://stackoverflow.com/questions/48376580/google-colab-how-to-read-data-from-my-google-drive
which helped me to connect the colab to my drive
Here it is as well:
from google.colab import drive
drive.mount('/content/gdrive')
My question: Is there anyway to do this process of the google authentication only once? Colab disconnects from time to time if not in use and than I need to restart the authentication process.
Thanks
Authentication is done per machine; exchanging keys to access drive. Since you always get a new machine on re-connect, you need to re-authenticate.
However, another option is to use an API key for your google drive access. This can be done via theGoogle API Console for the Drive Platform. Essentially you would have one API Token you can over and over again; possibly leading you to store it inside the notebook... where the bad part starts.
If you opt-in on using a token to "manually" mount the drive folder, as soon someone gets a hand on this token (i.e. sharing your notebook, man in the middle, forgetting to delete the key), your drive folder is compromised. That is the reason why my formal answer to this question is: No, you can't.
But since colab provides the whole machine with a unix environment where you can execute arbitrary bash commands, you are in control and leave you with additional resources for further investigation:
https://stackoverflow.com/a/50888878/2763239
https://medium.com/#uditsaini/access-google-drive-and-mount-google-drive-to-colab-notebook-google-ccbca1691d31
https://github.com/googlecolab/colabtools/issues/121#issuecomment-423326300
A recently released feature makes this much simpler. The details are described in this answer:
https://stackoverflow.com/a/60103029/8841057
The short version is that for notebooks in Drive that aren't shared, there's now a GUI option to mount Drive files automatically for a given notebook.

Is there any way to mount my local HDD/SSD to use with Google Colaboratory?

Colaboratory allows to mount Google Drive and use data from Drive but I have massive datasets (including images) on my local system that would take a long time and huge space on drive.
So, I am looking for something similar but here I want to mount my local system's Drive.
One option is to run Jupyter locally and connect to it using Colab, thereby providing the benefits of Drive storage and sharing for your notebooks, but allowing easy access to local files.
Instructions are here: https://research.google.com/colaboratory/local-runtimes.html