How to use TreeTagger in Google Colab? - google-colaboratory

i want to use TreeTagger module to tag POS-information on the raw corpus.
As it seems to be faster to use GPU via Google Colab, I installed TreeTagger module, but Colab codes cannot locate TreeTagger directory.
The error type is like this:
TreeTaggerError: Can't locate TreeTagger directory (and no TAGDIR specified)
Please tell me where I should uplaod the treetagger folder.

You have to specify directory:
treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/') # treetagger is the installation dir
Installation in Colab.
Follow the instructions on the website.
In one cell in Colab you have to put the following (for other (not English) languages put other link for parameter files):
%%bash
mkdir treetagger
cd treetagger
# Download the tagger package for your system (PC-Linux, Mac OS-X, ARM64, ARMHF, ARM-Android, PPC64le-Linux).
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
tar -xzvf tree-tagger-linux-3.2.4.tar.gz
# Download the tagging scripts into the same directory.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
gunzip tagger-scripts.tar.gz
# Download the installation script install-tagger.sh.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/install-tagger.sh
# Download the parameter files for the languages you want to process.
# list of all files (parameter files) https://cis.lmu.de/~schmid/tools/TreeTagger/#parfiles
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/english.par.gz
sh install-tagger.sh
cd ..
sudo pip install treetaggerwrapper
And in the other following cell you can check the installation:
>>> import pprint # For proper print of sequences.
>>> import treetaggerwrapper
>>> #1) build a TreeTagger wrapper:
>>> tagger = treetaggerwrapper.TreeTagger(TAGLANG='en', TAGDIR='treetagger/')
>>> #2) tag your text.
>>> tags = tagger.tag_text("This is a very short text to tag.")
>>> #3) use the tags list... (list of string output from TreeTagger).
>>> pprint.pprint(tags)

Related

Run M-file in Google Colab but got error: no such file

I want to run M-file (Octave) in Colab. first, I upload all file in Google Drive with the following address:
/content/drive/MyDrive/Colab/Corroded/hybrid/FAEICA_ANN/
Then I used the following code to run my file:
!apt install octave
from google.colab import drive
drive.mount('/content/drive')
import sys
import os
py_file_location = "/content/drive/MyDrive/Colab/Corroded/hybrid/FAEICA_ANN"
sys.path.append(os.path.abspath(py_file_location))
!octave -W MainANN_FAEICA.m
But I got:
error: no such file, '/content/MainANN_FAEICA.m'
error: source: error sourcing file '/content/MainANN_FAEICA.m'
Although I added the path, Colab cannot recognize the file.
You've added the directory to Python's system search path, not to Octave's search path.
The simplest way to run this file is to change the working directory to where the file is, then run the file:
os.chdir(py_file_location)
!octave -W MainANN_FAEICA.m

import local file to google colab

I don't understand how colab works with directories, I created a notebook, and colab put it in /Google Drive/Colab Notebooks.
Now I need to import a file (data.py) where I have a bunch of functions I need. Intuition tells me to put the file in that same directory and import it with:
import data
but apparently that's not the way...
I also tried adding the directory to the set of paths but I am specifying the directory incorrectly..
Can anyone help with this?
Thanks in advance!
Colab notebooks are stored on Google Drive. But it is run on another virtual machine. So, you need to copy your data.py there too. Do this to upload data.py through Colab.
from google.colab import files
files.upload()
# choose the file on your computer to upload it then
import data
Now google is officially providing support for accessing and working with Gdrive at ease.
You can use the below code to mount your drive to Colab:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive/{location you want to move}
To easily upload a local file you can use the new Google Colab feature:
click on right arrow on the left of your screen (below the Google
Colab logo)
select Files tab
click Upload button
It will open a popup to choose file to upload from your local filesystem.
To upload Local files from system to collab storage/directory.
from google.colab import files
def getLocalFiles():
_files = files.upload()
if len(_files) >0:
for k,v in _files.items():
open(k,'wb').write(v)
getLocalFiles()
So, here is how I finally solved this. I have to point out however, that in my case I had to work with several files and proprietary modules that were changing all the time.
The best solution I found to do this was to use a FUSE wrapper to "link" colab to my google account. I used this particular tool:
https://github.com/astrada/google-drive-ocamlfuse
There is an example of how to set up your environment there, but here is how I did it:
# Install a Drive FUSE wrapper.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
At this point you'll have installed the wrapper and the code above will generate a couple of links for you to authorize access to your google drive account.
The you have to create a folder in the colab file system (remember this is not persistent, as far as I know...) and mount your drive there:
# Create a directory and mount Google Drive using that directory.
!mkdir -p drive
!google-drive-ocamlfuse drive
print ('Files in Drive:')
!ls drive/
the !ls command will print the directory contents so you can check it works, and that's it. You now have all the files you need and you can make changes to them with no further complications. Remember that you may need to restar the kernel to update the imports and variables.
Hope this works for someone!
you can write following commands in colab to mount the drive
from google.colab import drive
drive.mount('/content/gdrive')
and you can download from some external url into the drive through simple linux command wget like this
!wget 'https://dataverse.harvard.edu/dataset'

Changing directory in Google colab (breaking out of the python interpreter)

So I'm trying to git clone and cd into that directory using Google collab - but I cant cd into it. What am I doing wrong?
!rm -rf SwitchFrequencyAnalysis && git clone https://github.com/ACECentre/SwitchFrequencyAnalysis.git
!cd SwitchFrequencyAnalysis
!ls
datalab/ SwitchFrequencyAnalysis/
You would expect it to output the directory contents of SwitchFrequencyAnalysis - but instead its the root. I'm feeling I'm missing something obvious - Is it something to do with being within the python interpreter? (where is the documentation??)
Demo here.
use
%cd SwitchFrequencyAnalysis
to change the current working directory for the notebook environment (and not just the subshell that runs your ! command).
you can confirm it worked with the pwd command like this:
!pwd
further information about jupyter / ipython magics:
http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd
As others have pointed out, the cd command needs to start with a percentage sign:
%cd SwitchFrequencyAnalysis
Difference between % and !
Google Colab seems to inherit these syntaxes from Jupyter (which inherits them from IPython).
Jake VanderPlas explains this IPython behaviour here. You can see the excerpt below.
If you play with IPython's shell commands for a while, you might
notice that you cannot use !cd to navigate the filesystem:
In [11]: !pwd
/home/jake/projects/myproject
In [12]: !cd ..
In [13]: !pwd
/home/jake/projects/myproject
The reason is that
shell commands in the notebook are executed in a temporary subshell.
If you'd like to change the working directory in a more enduring way,
you can use the %cd magic command:
In [14]: %cd ..
/home/jake/projects
Another way to look at this: you need % because changing directory is relevant to the environment of the current notebook but not to the entire server runtime.
In general, use ! if the command is one that's okay to run in a separate shell. Use % if the command needs to be run on the specific notebook.
Use os.chdir. Here's a full example:
https://colab.research.google.com/notebook#fileId=1CSPBdmY0TxU038aKscL8YJ3ELgCiGGju
Compactly:
!mkdir abc
!echo "file" > abc/123.txt
import os
os.chdir('abc')
# Now the directory 'abc' is the current working directory.
# and will show 123.txt.
!ls
If you want to use the cd or ls functions , you need proper identifiers before the function names ( % and ! respectively)
use %cd and !ls to navigate
.
!ls # to find the directory you're in ,
%cd ./samplefolder #if you wanna go into a folder (say samplefolder)
or if you wanna go out of the current folder
%cd ../
and then navigate to the required folder/file accordingly
!pwd
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/Data')
!pwd
view this answer for detailed explaination
https://stackoverflow.com/a/61636734/11535267
I believe you'd have to mount the Google Drive first before you do anything else.
from google.colab import drive
drive.mount('/content/drive')

Shipping and using virtualenv in a pyspark job

PROBLEM: I am attempting to run a spark-submit script from my local machine to a cluster of machines. The work done by the cluster uses numpy. I currently get the following error:
ImportError:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
Original error was: cannot import name multiarray
DETAIL:
In my local environment I have setup a virtualenv that includes numpy as well as a private repo I use in my project and other various libraries. I created a zip file (lib/libs.zip) from the site-packages directory at venv/lib/site-packages where 'venv' is my virtual environment. I ship this zip to the remote nodes. My shell script for performing the spark-submit looks like this:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--conf spark.pyspark.virtualenv.enabled=true \
--conf spark.pyspark.virtualenv.type=native \
--conf spark.pyspark.virtualenv.requirements=${parent}/requirements.txt \
--conf spark.pyspark.virtualenv.bin.path=${parent}/venv \
--py-files "${parent}/lib/libs.zip" \
--num-executors 1 \
--executor-cores 2 \
--executor-memory 2G \
--driver-memory 2G \
$parent/src/features/pi.py
I also know that on the remote nodes there is a /usr/local/bin/python2.7 folder that includes a python 2.7 install.
so in my conf/spark-env.sh I have set the following:
export PYSPARK_PYTHON=/usr/local/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7
When I run the script I get the error above. If I screen print the installed_distributions I get a zero length list []. Also my private library imports correctly (which says to me it is actually accessing my libs.zip site-packages.). My pi.py file looks something like this:
from myprivatelibrary.bigData.spark import spark_context
spark = spark_context()
import numpy as np
spark.parallelize(range(1, 10)).map(lambda x: np.__version__).collect()
EXPECTATION/MY THOUGHTS:
I expect this to import numpy correctly especially since I know numpy works correctly in my local virtualenv. I suspect this is because I'm not actually using the version of python that is installed in my virtualenv on the remote node. My question is first, how do I fix this and second how do I use my virtualenv installed python on the remote nodes instead of the python that is just manually installed and currently sitting on those machines? I've seen some write-ups on this but frankly they are not well written.
With --conf spark.pyspark.{} and export PYSPARK_PYTHON=/usr/local/bin/python2.7 you set options for your local environment / your driver. To set options for the cluster (executors) use the following syntax:
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON
Furthermore, I guess you should make your virtualenv relocatable (this is experimental, however). <edit 20170908> This means that the virtualenv uses relative instead of absolute links. </edit>
What we did in such cases: we shipped an entire anaconda distribution over hdfs.
<edit 20170908>
If we are talking about different environments (MacOs vs. Linux, as mentioned in the comment below), you cannot just submit a virtualenv, at least not if your virtualenv contains packages with binaries (as is the case with numpy). In that case I suggest you create yourself a 'portable' anaconda, i.e. install Anaconda in a Linux VM and zip it.
Regarding --archives vs. --py-files:
--py-files adds python files/packages to the python path. From the spark-submit documentation:
For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
--archives means these are extracted into the working directory of each executor (only yarn clusters).
However, a crystal-clear distinction is lacking, in my opinion - see for example this SO post.
In the given case, add the anaconda.zip via --archives, and your 'other python files' via --py-files.
</edit>
See also: Running Pyspark with Virtualenv, a blog post by Henning Kropp.

Converting .ui to .py with Python 3.6 on PyQt5 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I can't convert ui to py
it's giving this:
Instead of installing Python packages by hand, I would consider using conda and pip from a recent Anaconda install (https://www.anaconda.com/download/).
After installing Anaconda with python 3.6, open a privileged (Run as Administrator) cmd or git bash and run the following commands:
Installing PyQt5
PyQt5 is the default one for Python 3.6. You can check available packages by running (conda search pyqt)
conda install pyqt
Generating .py file from .ui
python -m PyQt5.uic.pyuic -x [FILENAME].ui -o [FILENAME].py
Importing generated .py on your Python code
Now, suppose that your file is called MainWindow.py, and its type is QMainWindow. This is how you import it on Python
from PyQt5 import QtWidgets
from mainwindow import Ui_MainWindow
import sys
class ApplicationWindow(QtWidgets.QMainWindow):
def __init__(self):
super(ApplicationWindow, self).__init__()
self.ui = Ui_MainWindow()
self.ui.setupUi(self)
def main():
app = QtWidgets.QApplication(sys.argv)
application = ApplicationWindow()
application.show()
sys.exit(app.exec_())
if __name__ == "__main__":
main()
you are using the correct syntax: pyuic5 -x file.ui -o file.py
but you have to make sure that the file.ui is in the same location of your pyuic5.bat
python -m PyQt5.uic.pyuic -x [FILENAME].ui -o [FILENAME].py
This worked for me. Thanks to Danilo Gasques
Sorry for my bad english
You can find file "pyuic5.exe" (For example it is "C:\Python\venv\Scripts\pyuic.exe")
Through the command line go to the folder with the file "needToConvert.ui"
Enter the following command line: C:\Python\venv\Scripts\pyuic.exe needToConvert.ui -o needToConvert.py
What you need is python3.dll file which is missing and you have to place in your python directory
Go here (https://winpython.github.io/).
Download the version of python you have and also see which bit
version
Download the zero version and extract it somewhere temporarily
In extracted folder search for python3.dll in the search bar
Extract where your python setup is and try then it will work
Use this .bat file to automatically convert all *.ui files to python files.
All you need is:
Save the script below to ui2py.bat file
Edit the file with notepad and specify the pythonPath directory from your PC
Save changes and execute the ui2py.bat file convertor
#echo off
rem set the python path
set pythonPath=G:\Programming\WinPython-64bit-3.6.3.0Qt5\python-3.6.3.amd64
echo [START] converting .ui files...
rem convert all .ui files in current directory
for %%i in (*.ui) do (
rem Display the file name
echo %%i -- ui_%%~ni.py
rem converting
%pythonPath%\python.exe -m PyQt5.uic.pyuic -x %%i -o ui_%%~ni.py
)
echo [END] converting .ui files...