Changing directory in Google colab (breaking out of the python interpreter) - google-colaboratory

So I'm trying to git clone and cd into that directory using Google collab - but I cant cd into it. What am I doing wrong?
!rm -rf SwitchFrequencyAnalysis && git clone https://github.com/ACECentre/SwitchFrequencyAnalysis.git
!cd SwitchFrequencyAnalysis
!ls
datalab/ SwitchFrequencyAnalysis/
You would expect it to output the directory contents of SwitchFrequencyAnalysis - but instead its the root. I'm feeling I'm missing something obvious - Is it something to do with being within the python interpreter? (where is the documentation??)
Demo here.

use
%cd SwitchFrequencyAnalysis
to change the current working directory for the notebook environment (and not just the subshell that runs your ! command).
you can confirm it worked with the pwd command like this:
!pwd
further information about jupyter / ipython magics:
http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd

As others have pointed out, the cd command needs to start with a percentage sign:
%cd SwitchFrequencyAnalysis
Difference between % and !
Google Colab seems to inherit these syntaxes from Jupyter (which inherits them from IPython).
Jake VanderPlas explains this IPython behaviour here. You can see the excerpt below.
If you play with IPython's shell commands for a while, you might
notice that you cannot use !cd to navigate the filesystem:
In [11]: !pwd
/home/jake/projects/myproject
In [12]: !cd ..
In [13]: !pwd
/home/jake/projects/myproject
The reason is that
shell commands in the notebook are executed in a temporary subshell.
If you'd like to change the working directory in a more enduring way,
you can use the %cd magic command:
In [14]: %cd ..
/home/jake/projects
Another way to look at this: you need % because changing directory is relevant to the environment of the current notebook but not to the entire server runtime.
In general, use ! if the command is one that's okay to run in a separate shell. Use % if the command needs to be run on the specific notebook.

Use os.chdir. Here's a full example:
https://colab.research.google.com/notebook#fileId=1CSPBdmY0TxU038aKscL8YJ3ELgCiGGju
Compactly:
!mkdir abc
!echo "file" > abc/123.txt
import os
os.chdir('abc')
# Now the directory 'abc' is the current working directory.
# and will show 123.txt.
!ls

If you want to use the cd or ls functions , you need proper identifiers before the function names ( % and ! respectively)
use %cd and !ls to navigate
.
!ls # to find the directory you're in ,
%cd ./samplefolder #if you wanna go into a folder (say samplefolder)
or if you wanna go out of the current folder
%cd ../
and then navigate to the required folder/file accordingly

!pwd
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/Data')
!pwd
view this answer for detailed explaination
https://stackoverflow.com/a/61636734/11535267

I believe you'd have to mount the Google Drive first before you do anything else.
from google.colab import drive
drive.mount('/content/drive')

Related

Rendering in Google Colab gone wrong

I am a newbie Blender user. I made my first animation yesterday and tried to render it in Google Colab. I ran a code which worked for a Youtuber who is running Blender2.91-linux version, but the same code showed error when I ran it.
I am currently using Windows 10 and really new at Blender. I need a working code that can successfully render Animation made with blender in Colab.
This is the code that I found online and ran. Please help :(
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
#Render an animation
!sudo ./blender-2.93.0-linux-x64/blender -b $filename -noaudio -E 'Cycles' -o '//image_####' -s 0 -e 72 -a -- --cycles-device OpenCL
The output of the last line came :
sudo: ./blender-2.93.0-linux-x64/blender: command not found
In short, I want a working code that can help me render Animation made in Blender in Google Colab.
Thank you in advance.... :)
The problem with your code is that the folder name is not the same after extracting from the tar file. Here is what you can do to fix your issue:
#Download Blender from Repository
!wget http://download.blender.org/release/Blender2.93/blender-2.93.0-linux-x64.tar.xz
#Install Blender
!tar xf blender-2.93.0-linux-x64.tar.xz
#Connect Google Drive
from google.colab import drive
drive.mount('/gdrive')
#Set Paths to Blender files
filename = '/gdrive/MyDrive/SHIP IN WATER With Particles.blend'
After that, add this command: !ls – you will get list files and folders in the current directory. Copy the extracted folder name from there and replace the old folder name with it:
OLD
./blender-2.93.0-linux-x64/blender
NEW
./NewFoldeName/blender

How to make persistent installation with Colaboratory?

Is there a way to have a persistent installation on the machine you get when you launch a notebook from the Colaboratory Environment ?
There is such a mechanism with mybinder.org with a requirements.txt or setup.py that specify the different packages you want at the startup.
https://mybinder.readthedocs.io/en/latest/config_files.html#requirements-txt-install-a-python-environment
I have tested a colab notebook with an installation procedure but I have to rerun a sequence of cells each time I want to work.
Try:
https://colab.research.google.com/drive/1u5Y-92-b4rVcJjkUpPPa5xnuvKAHcnNa
Also how to define environment variables for once (at startup) ?
Do I have also to rerun their settings each time ?
Thanks
Patrick
Here's how I install a library jdc permanently, by installing it in Google Drive.
import os, sys
from google.colab import drive
drive.mount('/content/mnt')
nb_path = '/content/notebooks'
os.symlink('/content/mnt/My Drive/Colab Notebooks', nb_path)
sys.path.insert(0, nb_path)
# call this one time only
!pip install --target=$nb_path jdc
# later just import it
import jdc # for %%add_to
And here's how to set environmental variables.
%env VAR1=value1
%env VAR2=value2
Put them in your first cell and run it.

Couldn't open file: data/obj.data and /bin/bash: ./darknet: No such file or directory

I tried to train the custom object according to https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects and I got an error.
Please tell me any good solutions.
I am going with google colaboratory.
I changed the directory, but it does not change.
dir
--content/
┠darknet-master/
┠build/
┠darknet/
┠x64/
┠data/
┠obj.data
┠
%%bash
cd /content/darknet-master./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74 > train_log.txt
Couldn't open file: data/obj.dat
%cd /content/darknet-master/build/darknet/x64
!./darknet detect cfg/yolov3.cfg yolov3.weights data/person.jpg
/content/darknet-master/build/darknet/x64
/bin/bash: ./darknet: No such file or directory
Use %cd or os.chdir rather than %%bash cd...
The reason is that %%bash run commands in a sub-shell. But, I believe what you want to do is to change the working directory of the Python backend running your code.

import local file to google colab

I don't understand how colab works with directories, I created a notebook, and colab put it in /Google Drive/Colab Notebooks.
Now I need to import a file (data.py) where I have a bunch of functions I need. Intuition tells me to put the file in that same directory and import it with:
import data
but apparently that's not the way...
I also tried adding the directory to the set of paths but I am specifying the directory incorrectly..
Can anyone help with this?
Thanks in advance!
Colab notebooks are stored on Google Drive. But it is run on another virtual machine. So, you need to copy your data.py there too. Do this to upload data.py through Colab.
from google.colab import files
files.upload()
# choose the file on your computer to upload it then
import data
Now google is officially providing support for accessing and working with Gdrive at ease.
You can use the below code to mount your drive to Colab:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive/{location you want to move}
To easily upload a local file you can use the new Google Colab feature:
click on right arrow on the left of your screen (below the Google
Colab logo)
select Files tab
click Upload button
It will open a popup to choose file to upload from your local filesystem.
To upload Local files from system to collab storage/directory.
from google.colab import files
def getLocalFiles():
_files = files.upload()
if len(_files) >0:
for k,v in _files.items():
open(k,'wb').write(v)
getLocalFiles()
So, here is how I finally solved this. I have to point out however, that in my case I had to work with several files and proprietary modules that were changing all the time.
The best solution I found to do this was to use a FUSE wrapper to "link" colab to my google account. I used this particular tool:
https://github.com/astrada/google-drive-ocamlfuse
There is an example of how to set up your environment there, but here is how I did it:
# Install a Drive FUSE wrapper.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
At this point you'll have installed the wrapper and the code above will generate a couple of links for you to authorize access to your google drive account.
The you have to create a folder in the colab file system (remember this is not persistent, as far as I know...) and mount your drive there:
# Create a directory and mount Google Drive using that directory.
!mkdir -p drive
!google-drive-ocamlfuse drive
print ('Files in Drive:')
!ls drive/
the !ls command will print the directory contents so you can check it works, and that's it. You now have all the files you need and you can make changes to them with no further complications. Remember that you may need to restar the kernel to update the imports and variables.
Hope this works for someone!
you can write following commands in colab to mount the drive
from google.colab import drive
drive.mount('/content/gdrive')
and you can download from some external url into the drive through simple linux command wget like this
!wget 'https://dataverse.harvard.edu/dataset'

Shipping and using virtualenv in a pyspark job

PROBLEM: I am attempting to run a spark-submit script from my local machine to a cluster of machines. The work done by the cluster uses numpy. I currently get the following error:
ImportError:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
Original error was: cannot import name multiarray
DETAIL:
In my local environment I have setup a virtualenv that includes numpy as well as a private repo I use in my project and other various libraries. I created a zip file (lib/libs.zip) from the site-packages directory at venv/lib/site-packages where 'venv' is my virtual environment. I ship this zip to the remote nodes. My shell script for performing the spark-submit looks like this:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--conf spark.pyspark.virtualenv.enabled=true \
--conf spark.pyspark.virtualenv.type=native \
--conf spark.pyspark.virtualenv.requirements=${parent}/requirements.txt \
--conf spark.pyspark.virtualenv.bin.path=${parent}/venv \
--py-files "${parent}/lib/libs.zip" \
--num-executors 1 \
--executor-cores 2 \
--executor-memory 2G \
--driver-memory 2G \
$parent/src/features/pi.py
I also know that on the remote nodes there is a /usr/local/bin/python2.7 folder that includes a python 2.7 install.
so in my conf/spark-env.sh I have set the following:
export PYSPARK_PYTHON=/usr/local/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7
When I run the script I get the error above. If I screen print the installed_distributions I get a zero length list []. Also my private library imports correctly (which says to me it is actually accessing my libs.zip site-packages.). My pi.py file looks something like this:
from myprivatelibrary.bigData.spark import spark_context
spark = spark_context()
import numpy as np
spark.parallelize(range(1, 10)).map(lambda x: np.__version__).collect()
EXPECTATION/MY THOUGHTS:
I expect this to import numpy correctly especially since I know numpy works correctly in my local virtualenv. I suspect this is because I'm not actually using the version of python that is installed in my virtualenv on the remote node. My question is first, how do I fix this and second how do I use my virtualenv installed python on the remote nodes instead of the python that is just manually installed and currently sitting on those machines? I've seen some write-ups on this but frankly they are not well written.
With --conf spark.pyspark.{} and export PYSPARK_PYTHON=/usr/local/bin/python2.7 you set options for your local environment / your driver. To set options for the cluster (executors) use the following syntax:
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON
Furthermore, I guess you should make your virtualenv relocatable (this is experimental, however). <edit 20170908> This means that the virtualenv uses relative instead of absolute links. </edit>
What we did in such cases: we shipped an entire anaconda distribution over hdfs.
<edit 20170908>
If we are talking about different environments (MacOs vs. Linux, as mentioned in the comment below), you cannot just submit a virtualenv, at least not if your virtualenv contains packages with binaries (as is the case with numpy). In that case I suggest you create yourself a 'portable' anaconda, i.e. install Anaconda in a Linux VM and zip it.
Regarding --archives vs. --py-files:
--py-files adds python files/packages to the python path. From the spark-submit documentation:
For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
--archives means these are extracted into the working directory of each executor (only yarn clusters).
However, a crystal-clear distinction is lacking, in my opinion - see for example this SO post.
In the given case, add the anaconda.zip via --archives, and your 'other python files' via --py-files.
</edit>
See also: Running Pyspark with Virtualenv, a blog post by Henning Kropp.