Setting up DeepLabV3 in colab - tensorflow

So I am trying to set up deeplab in colab.
I am running:
[1]
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive/deeplab_files
[2]
%env PYTHONPATH=/content/drive/My\ Drive/deeplab_files/:/content/drive/My\ Drive/deeplab_files/slim
!echo $PYTHONPATH
[3]
!python deeplab/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size=360 \
--vis_crop_size=480 \
--dataset="camvid" \
--colormap_type="pascal" \
--checkpoint_dir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/exp/train_on_trainval_set/train' \
--vis_logdir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/exp/train_on_trainval_set/vis' \
--dataset_dir='/content/drive/My\ Drive/deeplab_files/deeplab/datasets/PQR/tfrecord'
The last command, however, returns
sh: 1: export: Drive/deeplab_files/slim:/content/drive/My Drive/deeplab_files/:/content/drive/My Drive/deeplab_files/slim: bad variable name
Traceback (most recent call last):
File "deeplab/vis.py", line 28, in <module>
from deeplab import common
ModuleNotFoundError: No module named 'deeplab'
Anyone have any idea how I can set up deeplab? I have it set up on my personal machine, but it is much too slow. I uploaded the entire folder to my gdrive.
The odd thing is that I can do
from deeplab import common
from the notebook and that imports successfully

Here is a Github repo containing a Colab notebook running deeplab.
I have not tested it but the way you have uploaded your entire directory to Google Drive is not the right way to run things on Colab.
Think of Colab as a separate machine and you are mounting your Google Drive on this machine. Anything available on your Google Drive is not necessarily available to the Colab machine. You will have to add path of your Google Drive folder (say '\content\drive\My Drive\<path_to_your_folder>') to the sys.path for Colab machine using sys.path.insert(0, <path_of_your_drive_folder>) to make that path available to python environment running on the Colab machine.

Solved mt question. The linked repo that abggcv gave, unfortunately, runs into the same issue this question was citing.
You should clone the repo as normal, and run everything as normal. The only change is that before you run train.py, eval.py, or vis.py you'll need to run the following block:
%cd /root/deeplabvc/models/research/
import sys
sys.path.extend(['/root/deeplabvc/models/research/', '/root/deeplab/models/research/slim/'])
Note that /root/deeplab/ is the path to where I cloned the repo. You'll need to change this if the directory where you cloned the repo is different.
Furthermore, for some reason, you wont be able to run train.py/eval.py/vis.py successively. Even clearing the flags will give you an error about a duplicate flag. To fix this, just restart the runtime (wont lose your files).
Happy segmenting!

Deeplab import error occurs mostly when the PYTHONPATH is not setup properly. The installation instruction given does not work with COLAB environment. The Following has worked for me
%cd /content/deeplab/models/research/
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval
!mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis
!echo ${PYTHONPATH}
%env PATH_TO_TRAIN_DIR=/content/deeplab/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
%env PATH_TO_DATASET=/content/deeplab/models/research/deeplab/datasets/pascal_voc_seg/tfrecord
%env PYTHONPATH=/content/deeplab/models/research:/content/deeplab/models/research/deeplab:/content/deeplab/models/research/slim:/env/python
!echo ${PYTHONPATH}
Here is my COLAB notebook for Training of deeplab that worked

Related

Unable to run training command for LaBSE

I am trying to reproduce the fine-tuning stage of LaBSE[https://github.com/tensorflow/models/tree/master/official/projects/labse] model.
I have cloned the tensorflow/models repository. Set the enironment path as follows.
%env PYTHONPATH='/env/python:/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models'
!echo $PYTHONPATH
output: '/env/python:/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models'
installed pre-requisites
!pip install -U tensorflow
!pip install -U "tensorflow-text==2.10.*"
!pip install tf-models-official
Then I try to run the labse training command in readme.md.
python3 /content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models/official/projects/labse/train.py \
--experiment=labse/train \
--config_file=/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models/official/projects/labse/experiments/labse_bert_base.yaml \
--config_file=/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models/official/projects/labse/experiments/labse_base.yaml \
--params_override=${PARAMS} \
--model_dir=/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models \
--mode=train_and_eval
Issue
I get the following error.
File "/content/drive/MyDrive/Colab_Notebooks/p3_sentence_similiarity/models/official/projects/labse/train.py", line 23, in
from official.projects.labse import config_labse
ModuleNotFoundError: No module named 'official.projects.labse'
The import statement python from official.projects.labse import config_labse fails
System information
I executed this on colab as well as in a GPU machine. However in both environments I get the same error.
I need to know why the import statement failed and what corrective action should be taken for this.

Rendering in Google Colab gone wrong

I am a newbie Blender user. I made my first animation yesterday and tried to render it in Google Colab. I ran a code which worked for a Youtuber who is running Blender2.91-linux version, but the same code showed error when I ran it.
I am currently using Windows 10 and really new at Blender. I need a working code that can successfully render Animation made with blender in Colab.
This is the code that I found online and ran. Please help :(
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
#Render an animation
!sudo ./blender-2.93.0-linux-x64/blender -b $filename -noaudio -E 'Cycles' -o '//image_####' -s 0 -e 72 -a -- --cycles-device OpenCL
The output of the last line came :
sudo: ./blender-2.93.0-linux-x64/blender: command not found
In short, I want a working code that can help me render Animation made in Blender in Google Colab.
Thank you in advance.... :)
The problem with your code is that the folder name is not the same after extracting from the tar file. Here is what you can do to fix your issue:
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
After that, add this command: !ls – you will get list files and folders in the current directory. Copy the extracted folder name from there and replace the old folder name with it:
OLD
./blender-2.93.0-linux-x64/blender
NEW
./NewFoldeName/blender

Best Practice for Kaggle Datasets with Colab

I was wondering if anyone could confirm the best practice for downloading kaggle datasets to our colab notebooks?
I have seen code examples like the one below where we download the API token file and upload it to the environment, is that the best practice or is there a different/simpler/better approach?
Thanks in advance!
Jacob
from google.colab import files
!pip install -q kaggle
files.upload()
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d alxmamaev/flowers-recognition
There are 2 approaches that are both convenient:
1) Save your kaggle.json in GDrive. Then mount by just clicking (in the left pane). Then, copy it here.
!mkdir -p ~/.kaggle
!cp "drive/My Drive/kaggle.json" ~/.kaggle/
# the rest is the same
2) Embed the kaggle.json in Colab itself.
!mkdir ~/.kaggle
!echo '{"username":"korakot","key":"8db2xxx"}' > ~/.kaggle/kaggle.json
# the rest is the same
If you are worried, use the first which is more secure.
If you are lazy, use the second.

import local file to google colab

I don't understand how colab works with directories, I created a notebook, and colab put it in /Google Drive/Colab Notebooks.
Now I need to import a file (data.py) where I have a bunch of functions I need. Intuition tells me to put the file in that same directory and import it with:
import data
but apparently that's not the way...
I also tried adding the directory to the set of paths but I am specifying the directory incorrectly..
Can anyone help with this?
Thanks in advance!
Colab notebooks are stored on Google Drive. But it is run on another virtual machine. So, you need to copy your data.py there too. Do this to upload data.py through Colab.
from google.colab import files
files.upload()
# choose the file on your computer to upload it then
import data
Now google is officially providing support for accessing and working with Gdrive at ease.
You can use the below code to mount your drive to Colab:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive/{location you want to move}
To easily upload a local file you can use the new Google Colab feature:
click on right arrow on the left of your screen (below the Google
Colab logo)
select Files tab
click Upload button
It will open a popup to choose file to upload from your local filesystem.
To upload Local files from system to collab storage/directory.
from google.colab import files
def getLocalFiles():
_files = files.upload()
if len(_files) >0:
for k,v in _files.items():
open(k,'wb').write(v)
getLocalFiles()
So, here is how I finally solved this. I have to point out however, that in my case I had to work with several files and proprietary modules that were changing all the time.
The best solution I found to do this was to use a FUSE wrapper to "link" colab to my google account. I used this particular tool:
https://github.com/astrada/google-drive-ocamlfuse
There is an example of how to set up your environment there, but here is how I did it:
# Install a Drive FUSE wrapper.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
At this point you'll have installed the wrapper and the code above will generate a couple of links for you to authorize access to your google drive account.
The you have to create a folder in the colab file system (remember this is not persistent, as far as I know...) and mount your drive there:
# Create a directory and mount Google Drive using that directory.
!mkdir -p drive
!google-drive-ocamlfuse drive
print ('Files in Drive:')
!ls drive/
the !ls command will print the directory contents so you can check it works, and that's it. You now have all the files you need and you can make changes to them with no further complications. Remember that you may need to restar the kernel to update the imports and variables.
Hope this works for someone!
you can write following commands in colab to mount the drive
from google.colab import drive
drive.mount('/content/gdrive')
and you can download from some external url into the drive through simple linux command wget like this
!wget 'https://dataverse.harvard.edu/dataset'

Shipping and using virtualenv in a pyspark job

PROBLEM: I am attempting to run a spark-submit script from my local machine to a cluster of machines. The work done by the cluster uses numpy. I currently get the following error:
ImportError:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
Original error was: cannot import name multiarray
DETAIL:
In my local environment I have setup a virtualenv that includes numpy as well as a private repo I use in my project and other various libraries. I created a zip file (lib/libs.zip) from the site-packages directory at venv/lib/site-packages where 'venv' is my virtual environment. I ship this zip to the remote nodes. My shell script for performing the spark-submit looks like this:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--conf spark.pyspark.virtualenv.enabled=true \
--conf spark.pyspark.virtualenv.type=native \
--conf spark.pyspark.virtualenv.requirements=${parent}/requirements.txt \
--conf spark.pyspark.virtualenv.bin.path=${parent}/venv \
--py-files "${parent}/lib/libs.zip" \
--num-executors 1 \
--executor-cores 2 \
--executor-memory 2G \
--driver-memory 2G \
$parent/src/features/pi.py
I also know that on the remote nodes there is a /usr/local/bin/python2.7 folder that includes a python 2.7 install.
so in my conf/spark-env.sh I have set the following:
export PYSPARK_PYTHON=/usr/local/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7
When I run the script I get the error above. If I screen print the installed_distributions I get a zero length list []. Also my private library imports correctly (which says to me it is actually accessing my libs.zip site-packages.). My pi.py file looks something like this:
from myprivatelibrary.bigData.spark import spark_context
spark = spark_context()
import numpy as np
spark.parallelize(range(1, 10)).map(lambda x: np.__version__).collect()
EXPECTATION/MY THOUGHTS:
I expect this to import numpy correctly especially since I know numpy works correctly in my local virtualenv. I suspect this is because I'm not actually using the version of python that is installed in my virtualenv on the remote node. My question is first, how do I fix this and second how do I use my virtualenv installed python on the remote nodes instead of the python that is just manually installed and currently sitting on those machines? I've seen some write-ups on this but frankly they are not well written.
With --conf spark.pyspark.{} and export PYSPARK_PYTHON=/usr/local/bin/python2.7 you set options for your local environment / your driver. To set options for the cluster (executors) use the following syntax:
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON
Furthermore, I guess you should make your virtualenv relocatable (this is experimental, however). <edit 20170908> This means that the virtualenv uses relative instead of absolute links. </edit>
What we did in such cases: we shipped an entire anaconda distribution over hdfs.
<edit 20170908>
If we are talking about different environments (MacOs vs. Linux, as mentioned in the comment below), you cannot just submit a virtualenv, at least not if your virtualenv contains packages with binaries (as is the case with numpy). In that case I suggest you create yourself a 'portable' anaconda, i.e. install Anaconda in a Linux VM and zip it.
Regarding --archives vs. --py-files:
--py-files adds python files/packages to the python path. From the spark-submit documentation:
For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
--archives means these are extracted into the working directory of each executor (only yarn clusters).
However, a crystal-clear distinction is lacking, in my opinion - see for example this SO post.
In the given case, add the anaconda.zip via --archives, and your 'other python files' via --py-files.
</edit>
See also: Running Pyspark with Virtualenv, a blog post by Henning Kropp.