Using "Spacy package" on trained model: error "Can't locate model data" - spacy

I'm attempting to train the NER within SpaCy to recognize a new set of entities. Everything works just fine until I try to save and reload the model.
I'm attempting to follow the SpaCy doc recommendations from https://spacy.io/usage/training#saving-loading, so I have been saving with:
model.to_disk("save_this_model")
and then going to the Command Line and attempting to turn it into a package using:
python -m spacy package save_this_model saved_model_package
so I can then use
spacy.load('saved_model_package')
to pull the model back up.
However, when I'm attempting to use spacy package from the Command Line, I keep getting the error message "Can't locate model data"
I've looked in the save_this_model file and there is a meta.json there, as well as folders for the various pipes (I've tried this with all pipes saved and the non-NER pipes disabled, neither works).
Does anyone know what I might be doing wrong here?
I'm pretty inexperienced, so I think it's very possible that I'm attempting to make a package incorrectly or committing some other basic error. Thank you very much for your help in advance!

The spacy package command will create an installable and loadable Python package based on your model data, which you can then pip install and store in a single .tar.gz file. If you just want to load a model you've saved out, you usually don't even need to package it – you can simply pass the path to the model directory to spacy.load. For example:
nlp = spacy.load('/path/to/save_this_model')
spacy.load can take either a path to a model directory, a model package name or the name of a shortcut link, if available.
If you're new to spaCy and just experimenting with training models, loading them from a directory is usually the simplest solution. Model packages come in handy if you want to share your model with others (because you can share it as one installable file), or if you want to integrate it into your CI workflow or test suite (because the model can be a component of your application, like any other package it depends on).
So if you do want a Python package, you'll first need to build it by running the package setup from within the directory created by spacy package:
cd saved_model_package
python setup.py sdist
You can find more details here in the docs. The above command will create a .tar.gz archive in a directory /dist, which you can then install in your environment.
pip install /path/to/en_example_model-1.0.0.tar.gz
If the model installed correctly, it should show up in the installed packages when you run pip list or pip freeze. To load it, you can call spacy.load with the package name, which is usually the language code plus the name you specified when you packaged the model. In this example, en_example_model:
nlp = spacy.load('en_example_model')

Related

How do I install TensorFlow 2 & the object_detection module?

Background
I've been trying to follow the tutorial in this video. The goal is trying to install TensorFlow and TensorFlow's object_detection module.
Goal
How do I install it so that I can follow the rest of the tutorial? I only want to install the CPU version.
Additional Information
Errors that I ran into
ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 (from versions: None) ERROR: No matching distribution found for tensorflow
ERROR: tensorflow.whl is not a supported wheel on this platform.
##Research##
https://github.com/tensorflow/tensorflow/issues/39130
Tensorflow installation error: not a supported wheel on this platform
Prologue
I found this ridiculously complex, if anyone else has a simpler way to install this package please let everyone else know.
Main resource is https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#set-env
Summary of Steps
Newest update of python (x64 bit) which you can install here -
Create a virtual environment from that newest version of python
Get the latest version of TensorFlow from Google - https://www.tensorflow.org/install/pip#package-location
Install latest version of TensorFlow using pip with --upgrade tag and link from above step
Get latest version of protoc (data transfer protocol) - https://github.com/protocolbuffers/protobuf/releases
Install protoc and add location to path so you can easily call it later
Get TensorFlow Garden files from here - https://github.com/tensorflow/models
Copy to a location and add a folder structure models
Compile Protobufs for each model from the TensorFlow Garden using protoc
Set up COCO API to connect to the COCO dataset
Copy the setup file from TensorFlow2 in the TensorFlow Garden object_detection module
Run the installation for object_detection module & hope for the best
Detailed Descriptions
I ran into a problem when first attempting to install object_detection because my version of python wasn't supported
Get the latest version by going to this page - https://www.python.org/downloads/
Click "Download Python 3.9.X"
Once downloaded, run the installation file
Navigate to where python was installed and copy the path to the executable.
Open up command prompt by going Windows Key -> cmd
Navigate to where you would like to create the virtual environment by using the cd "path/to/change/directory/to"
then type "previously/copied/python/executable/path/python.exe" -m venv "name_of_your_virtual_environment"
TensorFlow seems to be supported by google storage api and not by pip to find the link to the latest stable TensorFlow use
this website https://www.tensorflow.org/install/pip#package-location
Now grab the TensorFlow installation link that matches your version of python.
Since mine was version 3.9 and windows I got this link - https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow_cpu-2.6.0-cp39-cp39-win_amd64.whl
Install TensorFlow by getting the python.exe from your virtual environment "name_of_your_virtual_environment"
"name_of_your_virtual_environment/Scripts/python.exe" -m pip install --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow_cpu-2.6.0-cp39-cp39-win_amd64.whl
Note that you have to use the upgrade tag for some reason
Because TensorFlow is a Google thing they use a special data interchange format called Protobuffs
Find the latest version of this tool by navigating to their website - https://github.com/protocolbuffers/protobuf/releases
Find the link under newest releases that matches your operating system aka windows and architecture x64
I chose https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protoc-3.17.3-win64.zip
To install this thing extract the .zip file and put into "C://Program Files/Google Protoc"
Get the folder location that has the protoc executable and add it to your environment variables
To edit your environmental variables press the Windows Key and search for "Environmental Variables" click on "Edit the system Environment Variables"
Then click "Environmental Variables"
Navigate to the "Path" environment variable under your user, select it and click edit
Click new and paste the executable location of protoc, aka "C:/Program Files/GoogleProtoc/bin"
Now to get the actual code for the object_detection module which is supoorted by researchers and is separate to base TensorFlow
Navigate to TensorFlow Garden - https://github.com/tensorflow/models
Download or clone the repository
Copy the files to another location using the following structure
TensorFlow
-> models (You have to add this folder)-> community
-> official
-> orbit
-> research
Restart your command prompt. It will need to be restarted to take into account changes in environmental variables. In this case
Path because you added protoc on there to make it easier to call from your command prompt
Again that is Windows Key -> Search cmd
Navigate inside the research folder with cd "TensorFlow/models/research/"
Run the command to download and compile Protobuf libraries for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.
Install COCO API so that you can access the dataset. It is a requirement of TensorFlow's object_detection api
Ensure you are still in the folder "TensorFlow/models/research/"
Copy the setup python file to the folder you're in using copy object_detection/packages/tf2/setup.py .
Now use pip to perform the installation "name_of_your_virtual_environment/Scripts/python.exe" -m pip install --use-feature=2020-resolver
Move the set up python file for TensorFlow 2 into the directory which will install the object_detection module.
Go into "TensorFlow/models/research/object_detection/packages/tf2/setup.py" and move that to "TensorFlow/models/research/object_detection/setup.py"
Now run the installation process for the object_detection module
Open CMD and navigate to "TensorFlow/models/research/object_detection/" by using cd command
Using your virtual environment run the script "name_of_your_virtual_environment/Scripts/python.exe" setup.py
Error Guides
ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 (from versions: None) ERROR: No matching distribution found for tensorflow
This occurs because your version of Python isn't correct or the architecture is wrong 32bit instead of 64bit. Fix this by downloading a new version of Python and creating a new virtual environment.
ERROR: tensorflow.whl is not a supported wheel on this platform.
Similar to above your version of Python might be wrong or you have selected the wrong link from the TensorFlow repo from Google Storage API. Start at the beginning, download the newest version of Python, create your new virtual environment and then download the right version of TensorFlow that matches the Python version, your operating system (e.g. MAC, Linux or Windows).

Tensorflow Handpose model offline loading

I am trying to download the Tensorflow's handpose model files for running offline.
So, i have downloaded the Handpose Model files from here.
https://tfhub.dev/mediapipe/tfjs-model/handskeleton/1/default/1
But, how can we use these files offline and predict in javascript and as well as on the react-Native code.
Just change all urls in the hanpose package to point to the url where you put your model ( in your localhost/public_dir)
that works well for me :)

Install Pandas for Jupyter notebook in Binder

I have a Jupyter notebook in GitHub and wanted to run it in Binder so other people can play with it.
However it complains that pandas is not installed.
The error is:
ModuleNotFoundError: No module named 'pandas'
How can I get Binder to install pandas for this instance ?
You had to edit/create requirements.txt at the base of the repo. I have tried to use the pip install method in a cell and this did not work for Binder as it prevents live installations in your session.
You can list the modules you need and specify versions if you need to.
There is an example in this GitHub:
https://github.com/binder-examples/requirements/blob/master/requirements.txt
Contents are:
numpy==1.16.*
matplotlib==3.*
seaborn==0.8.1
pandas
It is noteworthy to mention that the requirements.txt file has to be created in the GitHub repository, not in the Binder UI. The Binder UI is also read only and will not sync any file back to GitHub. Any requirements.txt in the Binder UI will not be picked up and also if reloading the runtime or refreshing the page it will not be considered. When the requirements.txt is created, launch again Binder UI start page pointing to the GitHub repository.

python: converting an egg-info directory to dist-info

I am working on an existing python application inside of a virtualenv environment. It is already set up to use wheel within its deployment.
I have added another module which my application now needs, and this module only exists in egg format. It is currently installed among all the other modules within ./env/lib/python3.6/site-packages, and an egg-info directory exists for it.
My question is this: how do I convert this one egg-info directory to wheel format, so that it gets included in the application's deployment when I do the following? ...
python3 setup.py bdist_wheel upload -r XXXXXXXX
Assuming I have installed a module under ./env/lib/python3.6/site-packages/the-module-1.2.3.egg-info, what are the steps to convert that module to dist-info?
Note that I don't see any *.egg file for that module, only the egg-info directory.
Thank you.

Embedded Python 3 not creating .pyc files when using importlib

I'm trying to embed a Python 3 interpreter in an Objective C Cocoa app on a Mac, following instructions in this answer (which extends this article) and building Python and PyObjC by hand.
I'd like to be able to run Python code as plugins. I specifically don't want to rely on the stock Apple Python (v2.7). I have most of it working but can't seem to reliably load the plugin scripts. It looks like the embedded Python interpreter is unable to create the __pycache__/*.pyc files. This may be a symptom, or a cause. If I import the plugin file manually from the Python3 REPL (via import or the imp or importlib modules) the .pyc is generated and the plugin then loads correctly. If I don't do this manually the .pyc is not created and I receive a ValueError "Unmarshallable object".
I've tried loosening permissions on the script directory to no avail. The cache_tag looks OK, both from the REPL and from within the bouncer script:
>>> sys.implementation.cache_tag
'cpython-35'
py_compile raises a Cocoa exception if I try and compile the plugin file manually (I'm still digging into that).
I'm using the following:
OS X 10.11.5 (El Capitan)
XCode 7.2.1
Python v3.5.2
PyObjC v3.11
I had to make a couple of necessary tweaks to the process outlined in the linked SO answer:
Compiling Python 3 required Homebrew versions of OpenSSL and zlib, and appropriate LDFLAGS and CPPFLAGS:
export CPPFLAGS="-I$(brew --prefix openssl)/include -I$(brew --prefix zlib)/include"
export LDFLAGS="-L$(brew --prefix openssl)/lib -L$(brew --prefix zlib)/lib"
I also ensure pip is installed OK when configuring Python to build:
./configure --prefix="/path/to/python/devbuild/python3.5.2" --with-ensurepip=install
There is a fork of the original article source (which uses the stock Python2) that works fine here, so I suspect I'm not too far off the mark. Any idea what I've missed? Do I need to sign, or otherwise give permission to, the embedded Python? Are there complilation/configuration options I've neglected to set?
TIA
Typical. It's always the last thing you try, isn't it? Adding the directory containing the plugin scripts to sys.path seems to do the trick, although I'm not sure why importlib needs this (I thought the point was to allow you to circumvent the normal import mechanism). Perhaps it's to do with the way the default importlib.machinery.SourceFileLoader is implemented?
Something like:
sys.path.append(os.path.abspath("/path/to/plugin/scripts"))
makes the "Unmarshallable object" problem go away. The cache directory and .pyc files are created correctly.