Is there any way to mount my local HDD/SSD to use with Google Colaboratory? - google-colaboratory

Colaboratory allows to mount Google Drive and use data from Drive but I have massive datasets (including images) on my local system that would take a long time and huge space on drive.
So, I am looking for something similar but here I want to mount my local system's Drive.

One option is to run Jupyter locally and connect to it using Colab, thereby providing the benefits of Drive storage and sharing for your notebooks, but allowing easy access to local files.
Instructions are here: https://research.google.com/colaboratory/local-runtimes.html

Related

Colab: Google drive file stream access permission is a hassle. Is there a better way?

I use Google Colab extensively. In order to get an easy access to files in my Google drive, I mount the drive to the file system of the virtual machine that runs Colab. Like that:
from google.colab import drive as cdrive
cdrive.mount('/content/gdrive')
% cd /content/gdrive/'My Drive'/'Colab Notebooks'/my_directory
In the beginning of each session, I need to give a permission to access my drive. In order to do that, I need to press 'Allow', copy a one-time-password and paste it to a dedicated text area. It's a bit tedious.
Is there a better way? can I give a permanent permission based on my machine? any other ideas?
At this moment, we can access google drive with the Mount Drive button on the left menu bar.
Confirm the access Google Drive action.
Then it will be mounted to your Colab notebook.

Is there a way to connect google colab to my google drive for good?

I found this great question here:
https://stackoverflow.com/questions/48376580/google-colab-how-to-read-data-from-my-google-drive
which helped me to connect the colab to my drive
Here it is as well:
from google.colab import drive
drive.mount('/content/gdrive')
My question: Is there anyway to do this process of the google authentication only once? Colab disconnects from time to time if not in use and than I need to restart the authentication process.
Thanks
Authentication is done per machine; exchanging keys to access drive. Since you always get a new machine on re-connect, you need to re-authenticate.
However, another option is to use an API key for your google drive access. This can be done via theGoogle API Console for the Drive Platform. Essentially you would have one API Token you can over and over again; possibly leading you to store it inside the notebook... where the bad part starts.
If you opt-in on using a token to "manually" mount the drive folder, as soon someone gets a hand on this token (i.e. sharing your notebook, man in the middle, forgetting to delete the key), your drive folder is compromised. That is the reason why my formal answer to this question is: No, you can't.
But since colab provides the whole machine with a unix environment where you can execute arbitrary bash commands, you are in control and leave you with additional resources for further investigation:
https://stackoverflow.com/a/50888878/2763239
https://medium.com/#uditsaini/access-google-drive-and-mount-google-drive-to-colab-notebook-google-ccbca1691d31
https://github.com/googlecolab/colabtools/issues/121#issuecomment-423326300
A recently released feature makes this much simpler. The details are described in this answer:
https://stackoverflow.com/a/60103029/8841057
The short version is that for notebooks in Drive that aren't shared, there's now a GUI option to mount Drive files automatically for a given notebook.

Share a part of google drive on Colab

We are sharing a google drive folder where we put the colab notebooks. Now we need to upload some text files permanently for notebook usage. I do not want to upload files every time I open colab. From what I searched, I had to upload files to google drive and mount it to colab in some way.
So, when I mount google drive to colab, can my teammates access all my files in it, or simply the shared folder.If not, is there a way to share only a folder or a file of google drive in colab.
If you share a folder with your teammates in Google Drive then that folder will appear in each of their drive mounts in colab. Each person running code in a notebook (even if they share a notebook) gets their own VM. One person should never see another person's Drive mount.
An alternative to sharing a data-file folder in Drive is to upload your data to GCS and have your notebook fetch it from there (example).

Google Colab variable values lost after VM recycling

I am using a Google Colab Jupyter notebook for algorithm training and have been struggling with an annoying problem. Since Colab is running in a VM environment, all my variables become undefined if my session is idle for a few hours. I come back from lunch and the training dataframe that takes a while to load becomes undefined and I have to read_csv again to load my dataframes.
Does anyone know how to rectify this?
If the notebook is idle for some time, it might get recycled: "Virtual machines are recycled when idle for a while" (see colaboratory faq)
There is also an imposed hard limit for a virtual machine to run (up to about 12 hours !?).
What could also happen is that your notebook gets disconnected from the internet / google colab. This could be an issue with your network. Read more about this here or here
There are no ways to "rectify" this, but if you have processed some data you could add a step to save it to google drive before entering the idle state.
You can use local runtime with Google Colab. Doing so, the Colab notebook will use your own machine's resources, and you won't have any limits. More on this: https://research.google.com/colaboratory/local-runtimes.html
There are various ways to save your data in the process:
you can save on the Notebook's VM filesystem, e. g. pd.to_csv("my_data.csv")
you can import sqlite3 which is the Python implementation of the popular SQLite database. Difference between SQLite and other SQL databases is that the DBMS runs inside your application, and data is saved to the file system of that application. Info: https://docs.python.org/2/library/sqlite3.html
you can save to your google drive, download to your local file system through your browser, upload to GCP... more info here: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=eikfzi8ZT_rW

Accessing Locally stored database using google colab

I have a dataset stored locally on my laptop, unfortunately i can't upload it to drive even in zip format, how can i train my model on this dataset(stored locally) using google colab
One option is to use Google Drive File Stream to mount your Google Drive on your local machine.
Then, you can put files there from your local machine and access them easily in Colab by mounting your Google Drive in the filesystem after running the following snippet:
from google.colab import drive
drive.mount('/content/drive')