Best Practice for Kaggle Datasets with Colab - google-colaboratory

I was wondering if anyone could confirm the best practice for downloading kaggle datasets to our colab notebooks?
I have seen code examples like the one below where we download the API token file and upload it to the environment, is that the best practice or is there a different/simpler/better approach?
Thanks in advance!
Jacob
from google.colab import files
!pip install -q kaggle
files.upload()
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d alxmamaev/flowers-recognition

There are 2 approaches that are both convenient:
1) Save your kaggle.json in GDrive. Then mount by just clicking (in the left pane). Then, copy it here.
!mkdir -p ~/.kaggle
!cp "drive/My Drive/kaggle.json" ~/.kaggle/
# the rest is the same
2) Embed the kaggle.json in Colab itself.
!mkdir ~/.kaggle
!echo '{"username":"korakot","key":"8db2xxx"}' > ~/.kaggle/kaggle.json
# the rest is the same
If you are worried, use the first which is more secure.
If you are lazy, use the second.

Related

Rendering in Google Colab gone wrong

I am a newbie Blender user. I made my first animation yesterday and tried to render it in Google Colab. I ran a code which worked for a Youtuber who is running Blender2.91-linux version, but the same code showed error when I ran it.
I am currently using Windows 10 and really new at Blender. I need a working code that can successfully render Animation made with blender in Colab.
This is the code that I found online and ran. Please help :(
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
#Render an animation
!sudo ./blender-2.93.0-linux-x64/blender -b $filename -noaudio -E 'Cycles' -o '//image_####' -s 0 -e 72 -a -- --cycles-device OpenCL
The output of the last line came :
sudo: ./blender-2.93.0-linux-x64/blender: command not found
In short, I want a working code that can help me render Animation made in Blender in Google Colab.
Thank you in advance.... :)
The problem with your code is that the folder name is not the same after extracting from the tar file. Here is what you can do to fix your issue:
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
After that, add this command: !ls – you will get list files and folders in the current directory. Copy the extracted folder name from there and replace the old folder name with it:
OLD
./blender-2.93.0-linux-x64/blender
NEW
./NewFoldeName/blender

Setting up DeepLabV3 in colab

So I am trying to set up deeplab in colab.
I am running:
[1]
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive/deeplab_files
[2]
%env PYTHONPATH=/content/drive/My\ Drive/deeplab_files/:/content/drive/My\ Drive/deeplab_files/slim
!echo $PYTHONPATH
[3]
!python deeplab/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size=360 \
--vis_crop_size=480 \
--dataset="camvid" \
--colormap_type="pascal" \
--checkpoint_dir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/exp/train_on_trainval_set/train' \
--vis_logdir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/exp/train_on_trainval_set/vis' \
--dataset_dir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/tfrecord'
The last command, however, returns
sh: 1: export: Drive/deeplab_files/slim:/content/drive/My Drive/deeplab_files/:/content/drive/My Drive/deeplab_files/slim: bad variable name
Traceback (most recent call last):
File "deeplab/vis.py", line 28, in <module>
from deeplab import common
ModuleNotFoundError: No module named 'deeplab'
Anyone have any idea how I can set up deeplab? I have it set up on my personal machine, but it is much too slow. I uploaded the entire folder to my gdrive.
The odd thing is that I can do
from deeplab import common
from the notebook and that imports successfully
Here is a Github repo containing a Colab notebook running deeplab.
I have not tested it but the way you have uploaded your entire directory to Google Drive is not the right way to run things on Colab.
Think of Colab as a separate machine and you are mounting your Google Drive on this machine. Anything available on your Google Drive is not necessarily available to the Colab machine. You will have to add path of your Google Drive folder (say '\content\drive\My Drive\<path_to_your_folder>') to the sys.path for Colab machine using sys.path.insert(0, <path_of_your_drive_folder>) to make that path available to python environment running on the Colab machine.
Solved mt question. The linked repo that abggcv gave, unfortunately, runs into the same issue this question was citing.
You should clone the repo as normal, and run everything as normal. The only change is that before you run train.py, eval.py, or vis.py you'll need to run the following block:
%cd /root/deeplabvc/models/research/
import sys
sys.path.extend(['/root/deeplabvc/models/research/', '/root/deeplab/models/research/slim/'])
Note that /root/deeplab/ is the path to where I cloned the repo. You'll need to change this if the directory where you cloned the repo is different.
Furthermore, for some reason, you wont be able to run train.py/eval.py/vis.py successively. Even clearing the flags will give you an error about a duplicate flag. To fix this, just restart the runtime (wont lose your files).
Happy segmenting!
Deeplab import error occurs mostly when the PYTHONPATH is not setup properly. The installation instruction given does not work with COLAB environment. The Following has worked for me
%cd /content/deeplab/models/research/
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis
!echo ${PYTHONPATH}
%env PATH_TO_TRAIN_DIR=/content/deeplab/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
%env PATH_TO_DATASET=/content/deeplab/models/research/deeplab/datasets/pascal_voc_seg/tfrecord
%env PYTHONPATH=/content/deeplab/models/research:/content/deeplab/models/research/deeplab:/content/deeplab/models/research/slim:/env/python
!echo ${PYTHONPATH}
Here is my COLAB notebook for Training of deeplab that worked

Mount multiple drives in google colab

I use this function to mount my google drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
and then copy files from it like this
!tar -C "/home/" -xvf '/content/drive/My Drive/files.tar'
I want to copy files from 2 drives, but when i try to run first script it just remount my 1st drive
How can i mount 1st drive, copy files, then mount another drive and copy files from 2nd drive?
Just in case anyone really needs to mount more than one drive, here's a workaround for mounting 2 drives.
First, mount the first drive using
from google.colab import drive
drive.mount('/drive1')
Then, use the following script to mount the second drive.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p /drive2
!google-drive-ocamlfuse /drive2
Now, you will be able to access files from the first drive from /drive1/My Drive/ and those of the second drive from /drive2/ (the second method doesn't create the My Drive folder automatically).
Cheers!
Fun Fact: The second method was actually a commonly used method to mount Google drive in Colab environment before Google came out with google.colab.drive
The colab drive module doesn't really support what you describe.
It might be simplest to share the files/folders you want to read from the second account's Drive to the first account's Drive (e.g. drive.google.com) and then read everything from the same mount.
If you're getting an exception with Suyog Jadhav's method:
MessageError: Error: credential propagation was unsuccessful
Follow the steps 1 to 3 described by Alireza Mazochi
https://stackoverflow.com/a/69881106/10214361
Follow these steps:
1- Run the below code:
!sudo add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!sudo apt-get update -qq 2>&1 > /dev/null
!sudo apt -y install -qq google-drive-ocamlfuse 2>&1 > /dev/null
!google-drive-ocamlfuse
2- Give permissions to GFUSE
From the previous step, you get an error like this. Click on the link that locates in the previous error message and authenticate your account.
Failure("Error opening URL:https://accounts.google.com/o/oauth2/auth?client_id=... ")
3- Run the below code:
!sudo apt-get install -qq w3m # to act as web browser
!xdg-settings set default-web-browser w3m.desktop # to set default browser
%cd /content
!mkdir drive
%cd drive
!mkdir MyDrive
%cd ..
%cd ..
!google-drive-ocamlfuse /content/drive/MyDrive
After this step, you will have a folder with your second drive.
There is Rclone which is a command-line program to manage files on cloud storage. It is a feature-rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.
In this url you will find how to set it up in Colab. you can even link one drive and other cloud storage products too.
https://towardsdatascience.com/why-you-should-try-rclone-with-google-drive-and-colab-753f3ec04ba1

import local file to google colab

I don't understand how colab works with directories, I created a notebook, and colab put it in /Google Drive/Colab Notebooks.
Now I need to import a file (data.py) where I have a bunch of functions I need. Intuition tells me to put the file in that same directory and import it with:
import data
but apparently that's not the way...
I also tried adding the directory to the set of paths but I am specifying the directory incorrectly..
Can anyone help with this?
Thanks in advance!
Colab notebooks are stored on Google Drive. But it is run on another virtual machine. So, you need to copy your data.py there too. Do this to upload data.py through Colab.
from google.colab import files
files.upload()
# choose the file on your computer to upload it then
import data
Now google is officially providing support for accessing and working with Gdrive at ease.
You can use the below code to mount your drive to Colab:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive/{location you want to move}
To easily upload a local file you can use the new Google Colab feature:
click on right arrow on the left of your screen (below the Google
Colab logo)
select Files tab
click Upload button
It will open a popup to choose file to upload from your local filesystem.
To upload Local files from system to collab storage/directory.
from google.colab import files
def getLocalFiles():
_files = files.upload()
if len(_files) >0:
for k,v in _files.items():
open(k,'wb').write(v)
getLocalFiles()
So, here is how I finally solved this. I have to point out however, that in my case I had to work with several files and proprietary modules that were changing all the time.
The best solution I found to do this was to use a FUSE wrapper to "link" colab to my google account. I used this particular tool:
https://github.com/astrada/google-drive-ocamlfuse
There is an example of how to set up your environment there, but here is how I did it:
# Install a Drive FUSE wrapper.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
At this point you'll have installed the wrapper and the code above will generate a couple of links for you to authorize access to your google drive account.
The you have to create a folder in the colab file system (remember this is not persistent, as far as I know...) and mount your drive there:
# Create a directory and mount Google Drive using that directory.
!mkdir -p drive
!google-drive-ocamlfuse drive
print ('Files in Drive:')
!ls drive/
the !ls command will print the directory contents so you can check it works, and that's it. You now have all the files you need and you can make changes to them with no further complications. Remember that you may need to restar the kernel to update the imports and variables.
Hope this works for someone!
you can write following commands in colab to mount the drive
from google.colab import drive
drive.mount('/content/gdrive')
and you can download from some external url into the drive through simple linux command wget like this
!wget 'https://dataverse.harvard.edu/dataset'

How do I upload a file to Google Colaboratory that is already on Google Drive?

https://colab.research.google.com/notebooks/io.ipynb#scrollTo=KHeruhacFpSU
In this notebook help it explains how to upload a file to drive and then download to Colaboratory but my files are already in drive.
Where can I find the file ID ?
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
My advice would be to use pydrive for this (docs).
You could also do this via the Drive UI -- I think the shortest path is to select the file, click "Get shareable link" -- it's the id parameter in the resulting URL. (If the file wasn't shared when you started, you'll want to then uncheck the green "link" button.)
Connect to Gdrive using below snippet.
You will have to authenticate twice using the link from cell output. But once this step is taken care of you can load files from drive and save to drive directly as you would do locally.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive
Read CSV using pandas
df = pd.read_csv('drive/path/file.csv')
Save CSV
Use index = False if you don't need index as first col in csv.
df.to_csv('drive/path/file.csv',index = False)
You can use curlWget extension in chrome. If you want to download anything, just click on download and as soon as it started downloading you can cancel the download. Go to curlwget and get the whole link of file or data, just copy it.
Go to colab, add a cell and paste it, just put ! mark before the copied data from curlwget.
Better to use colab api
from google.colab import drive
drive.mount('/content/drive')