mkdir and joinpath doesn't works in Google - google-colaboratory

save_dir = Path('/OpenPose_Pose_transfer/data/source/roy')
save_dir.mkdir(exist_ok=True)
img_dir = save_dir.joinpath('images')
img_dir.mkdir(exist_ok=True)
%cd ~/OpenPose_Pose_transfer/data/source/roy/
%cd ~/OpenPose_Pose_transfer/data/source/images/
No such file or directory: '/root/OpenPose_Pose_transfer/data/source/roy/'
/root/OpenPose_Pose_transfer
No such file or directory: '/root/OpenPose_Pose_transfer/data/source/images/'
/root/OpenPose_Pose_transfer
mkdir and joinpath do not work with Google Colab

From pathlib docs:
If parents is true, any missing parents of this path are created as
needed; they are created with the default permissions without taking
mode into account (mimicking the POSIX mkdir -p command).
Since in your case the parents probably do not exist yet, specify the parents=True parameter to mkdir(). Also, the last line should be %cd /OpenPose_Pose_transfer/data/source/roy/images/ since images is a sub-directory of roy.
Also, since you are creating OpenPose_Pose_transfer directory at /, do not precede the path with ~ ( which expands to /root in Colab)
Changed code:
from pathlib import Path
save_dir = Path('/OpenPose_Pose_transfer/data/source/roy')
save_dir.mkdir(parents=True, exist_ok=True)
img_dir = save_dir.joinpath('images')
img_dir.mkdir(exist_ok=True)
%cd /OpenPose_Pose_transfer/data/source/roy/
%cd /OpenPose_Pose_transfer/data/source/roy/images/

Related

Why does Colab extract a Google Drive .tar file to /content directory (and not to Google Drive)?

I use Colab. I am amazed by the following behavior:
drive.mount('/content/drive')
os.listdir("/content/drive/MyDrive/Colab Notebooks")
or
!ls -al /content/drive/MyDrive/Colab\ Notebooks
returns: ['FGNET_all.tar']
When I extract FGNET_all.tar
!tar xvf /content/drive/MyDrive/Colab\ Notebooks/FGNET_all.tar
FGNET is extracted to /content directory:
os.listdir("/content/drive/MyDrive/Colab Notebooks")
=> Only ['FGNET_all.tar'] not FGNET directory
os.listdir("/content")
or
!ls -al /content
returns
['.config', 'drive', '._FGNET', 'FGNET'] => FGNET directory
Why?
Unless the destination is specified, tar command extracts contents of the file to the working directory. Default working directory of Google Colab is the /content folder, which is why your extracted file is saved there.
You can specify the destination directory as follows:
!tar xvf /content/drive/MyDrive/Colab\ Notebooks/FGNET_all.tar -C /content/drive/MyDrive/Colab\ Notebooks/

How to use TreeTagger in Google Colab?

i want to use TreeTagger module to tag POS-information on the raw corpus.
As it seems to be faster to use GPU via Google Colab, I installed TreeTagger module, but Colab codes cannot locate TreeTagger directory.
The error type is like this:
TreeTaggerError: Can't locate TreeTagger directory (and no TAGDIR specified)
Please tell me where I should uplaod the treetagger folder.
You have to specify directory:
treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/') # treetagger is the installation dir
Installation in Colab.
Follow the instructions on the website.
In one cell in Colab you have to put the following (for other (not English) languages put other link for parameter files):
%%bash
mkdir treetagger
cd treetagger
# Download the tagger package for your system (PC-Linux, Mac OS-X, ARM64, ARMHF, ARM-Android, PPC64le-Linux).
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
tar -xzvf tree-tagger-linux-3.2.4.tar.gz
# Download the tagging scripts into the same directory.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
gunzip tagger-scripts.tar.gz
# Download the installation script install-tagger.sh.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/install-tagger.sh
# Download the parameter files for the languages you want to process.
# list of all files (parameter files) https://cis.lmu.de/~schmid/tools/TreeTagger/#parfiles
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/english.par.gz
sh install-tagger.sh
cd ..
sudo pip install treetaggerwrapper
And in the other following cell you can check the installation:
>>> import pprint # For proper print of sequences.
>>> import treetaggerwrapper
>>> #1) build a TreeTagger wrapper:
>>> tagger = treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/')
>>> #2) tag your text.
>>> tags = tagger.tag_text("This is a very short text to tag.")
>>> #3) use the tags list... (list of string output from TreeTagger).
>>> pprint.pprint(tags)

Stuck in datalab directory, can't navigate to my desired directory

I have a directory named ML in my drive: which has some of my colab-notebooks and some csv files which I want to load in my colab-notebook.
But , when I use !pwd to find out the current folder, its output is as follows.
!pwd
/content
When I use !ls .. The directory structure is as follows:
bin/
boot/
colabtools/
content/
datalab/
dev/
etc/
gpu-tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl
gpu-tensorflow-1.9.0-cp36-cp36m-linux_x86_64.whl
home/
lib/
lib64/
media/
mnt/
opt/
proc/
root/
run/
sbin/
srv/
sys/
tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl
tensorflow-1.9.0-cp36-cp36m-linux_x86_64.whl
tf_deps/
tmp/
tools/
usr/
var/
I am unable to !cd to my directory which is drive/ML , since I don't know the path, and the tree structure and thus, load my csv files.
use %cd instead of !cd, like this:
%cd drive/ML
In such cases we need to mount our google-drive and it will be shown as
below:
os.listdir()
/content
And,
os.listdir('/content')
datalab drive
If the google-drive isn't mounted the /content directory will only contain the datalab folder.
To mount your Google Drive:
from google.colab import drive
drive.mount('/content/drive')
For further information refer here
Afterwards, to change directory use:
import os
os.chdir("drive/My Drive/ML")

Changing directory in Google colab (breaking out of the python interpreter)

So I'm trying to git clone and cd into that directory using Google collab - but I cant cd into it. What am I doing wrong?
!rm -rf SwitchFrequencyAnalysis && git clone https://github.com/ACECentre/SwitchFrequencyAnalysis.git
!cd SwitchFrequencyAnalysis
!ls
datalab/ SwitchFrequencyAnalysis/
You would expect it to output the directory contents of SwitchFrequencyAnalysis - but instead its the root. I'm feeling I'm missing something obvious - Is it something to do with being within the python interpreter? (where is the documentation??)
Demo here.
use
%cd SwitchFrequencyAnalysis
to change the current working directory for the notebook environment (and not just the subshell that runs your ! command).
you can confirm it worked with the pwd command like this:
!pwd
further information about jupyter / ipython magics:
http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd
As others have pointed out, the cd command needs to start with a percentage sign:
%cd SwitchFrequencyAnalysis
Difference between % and !
Google Colab seems to inherit these syntaxes from Jupyter (which inherits them from IPython).
Jake VanderPlas explains this IPython behaviour here. You can see the excerpt below.
If you play with IPython's shell commands for a while, you might
notice that you cannot use !cd to navigate the filesystem:
In [11]: !pwd
/home/jake/projects/myproject
In [12]: !cd ..
In [13]: !pwd
/home/jake/projects/myproject
The reason is that
shell commands in the notebook are executed in a temporary subshell.
If you'd like to change the working directory in a more enduring way,
you can use the %cd magic command:
In [14]: %cd ..
/home/jake/projects
Another way to look at this: you need % because changing directory is relevant to the environment of the current notebook but not to the entire server runtime.
In general, use ! if the command is one that's okay to run in a separate shell. Use % if the command needs to be run on the specific notebook.
Use os.chdir. Here's a full example:
https://colab.research.google.com/notebook#fileId=1CSPBdmY0TxU038aKscL8YJ3ELgCiGGju
Compactly:
!mkdir abc
!echo "file" > abc/123.txt
import os
os.chdir('abc')
# Now the directory 'abc' is the current working directory.
# and will show 123.txt.
!ls
If you want to use the cd or ls functions , you need proper identifiers before the function names ( % and ! respectively)
use %cd and !ls to navigate
.
!ls # to find the directory you're in ,
%cd ./samplefolder #if you wanna go into a folder (say samplefolder)
or if you wanna go out of the current folder
%cd ../
and then navigate to the required folder/file accordingly
!pwd
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/Data')
!pwd
view this answer for detailed explaination
https://stackoverflow.com/a/61636734/11535267
I believe you'd have to mount the Google Drive first before you do anything else.
from google.colab import drive
drive.mount('/content/drive')

scp: how to transfer files from different directories preserving the whole path

I need to transfer all *.png from different directories from a remote server BUT preserving the full path of each .png file because all .png files have the same name.
scp -r -e server:coverages/K4me3/*/pos/output/*/*.png Desktop/
While coping it rewrites already existing .png files because the names od them are the same in different directories. I want to preserve the full directory path,s o that when copying, the .png files are copied within their own directories.
SCP doesn't preserve the paths of files on its own, as you have discovered.
You'll probably want to use rsync to do this, since rsync does preserve paths
I think the command would be:
rsync -a -r -v -z server_config:/path/to/root/directory/on/server [destination_folder]
This is the reverse of this question: scp a folder to a remote system keeping the directory layout
Alternatively, and as the comments suggest, you can write a script to get all of the files or lower level directories (with absolute path) and call an scp transfer on each of them. Here is a script that I at one point used to copy files in this way:
#!/usr/bin/env python
from multiprocessing.dummy import Pool
from subprocess import call
from functools import partial
root = # Root Directory
files = [
root + # Sub 1,
root + # Sub 2,
root + # Sub 3,
root + # Sub etc,
]
command_s = "scp -r -v -c arcfour -F /path/to/.ssh/config Server:"
command_e = " Output_Dir/"
max_processes = 4
# Transfer the files 4 at a time because my computer is busy with other stuff
cmds = []
for filename in files:
cmds.append(command_s + filename + command_e)
pool = Pool(max_processes)
for i, returncode in enumerate(pool.imap(partial(call, shell=True), cmds)):
if returncode != 0:
print ("%d command failed: %d" % (i, returncode))
Here is an answer to preserve directory structure and copy just the png files from a server to a local system based on ssh.
ssh user#server 'find /server/path -name "*.png" -print0 | xargs -0 tar -cO' | tar -xfv - -C .
Source: Link