How do I validate a spacy model at a nonstandard location? - spacy

I'd like to check whether spaCy is compatible with the model that I installed to a nonstandard location. For example:
import spacy, os
nlp = spacy.load("../data/p1/p2/en_core_web_lg-3.2.0")
os.system("python -m spacy validate")
Problem: the above validates spacy with the model at the standard location; my model is at a nonstandard location: ../data/p1/p2/en_core_web_lg-3.2.0
I'd like to do it in code. The command line
python -m spacy validate
does not take arguments. I'd like to do something like
assert(spacy.validate("../data/p1/p2/en_core_web_lg-3.2.0"))
before actually loading the model, but spacy has no function validate(). Or, is spacy.load() the only way to check for compatibility?

spacy validate just checks model names against a list to tell you to update the official models if they're old, it doesn't actually do validation. It's for helping with the upgrade from v2 to v3, and to help the dev team when troubleshooting user reports.
If you have a custom model, you should just use spacy.load to check your model.
It seems like you installed a non-custom model to a non-standard location. If you pip install it you can check it with spacy validate, but otherwise it won't check it.

Related

How to convert original yolo weights to TensorRT model?

I have developed an improved version of the yolov4-tiny model.
I would like to convert this developed model to a TensorRT model, but after referring to the attached URL, I found that I can only convert the original v4-tiny model.
My question is, how are other people converting their original models to TensorRT?
Thank you in advance.
URL
I understood that you have a custom model that you have trained yourself and you want to convert that to TensorRT.
There are many ways to convert the model to TensorRT. The process depends on which format your model is in but here's one that works for all formats:
Convert your model to ONNX format
Convert the model from ONNX to TensorRT using trtexec
Detailed steps
I assume your model is in Pytorch format. At least the train.py in the repository you linked saves models to that format. You can convert it to ONNX using tf2onnx.
Note that tf2onnx recommends the use of Python 3.7. You can install it here and create a virtual environment using conda or venv if you are using another version of Python.
Then, install tf2onnx:
pip install git+https://github.com/onnx/tensorflow-onnx
Convert your model from saved-model to ONNX
python3 -m tf2onnx.convert --saved-model ./model --output model.onnx
If you are using some other tf format for your model please see the readme of tf2onnx for help.
Then install TensorRT and its dependencies using this guide if you haven't already installed it. Alternatively you can use Nvidia Containers (NGC).
After you have installed TensorRT you can do this command to convert your model using fp16 precision.
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=3000 --buildOnly
You can check all CLI arguments by running
/usr/src/tensorrt/bin/trtexec --help
For YOLO v3-v5 You can use project with manual parsing cfg and weight files, manual building and saving engine file for TensorRT. For example enazoe/yolo-tensorrt. I'm using this come in Multitarget-tracker as fast objects detector on Windows/Linux x86/Nvidia Jetson.
In this case you doesn't need install trtexec and another software from NVidia.

How to check which Tensorflow version is compatible to Tensorflow Model Garden?

In order to use a pre-trained model with Tensorflow, we clone the Model Garden for TensorFlow, then choose a model in Model Zoo, for example, Detection Model Zoo: EfficientDet D0 512x512.
Is there anyway to detect the right version of Tensorflow, e.g. 2.7.0, or 2.7.1, or 2.8.0, that will surely work with the aforementioned setup?
The documentation (README.md) doesn't seem to mention this requirement. Maybe it is implied somehow?
I checked setup.py for Object Detection, but there is still no clue!
\models\research\object_detection\packages\tf2\setup.py
REQUIRED_PACKAGES = [
# Required for apache-beam with PY3
'avro-python3',
'apache-beam',
'pillow',
'lxml',
'matplotlib',
'Cython',
'contextlib2',
'tf-slim',
'six',
'pycocotools',
'lvis',
'scipy',
'pandas',
'tf-models-official>=2.5.1',
'tensorflow_io',
'keras'
]
I am not aware of a formal/quick way to determine the right Tensorflow version, given a specific Model Garden version, master branch. However, here is my workaround:
In the REQUIRED_PACKAGES above, we see tf-models-official>=2.5.1.
Checking the package history on pypi.org, the latest version, as of 03.02.2022, is 2.8.0.
So when installing this \models\research\object_detection\packages\tf2\setup.py file, pip will naturally fetch the latest version of tf-models-official, which is 2.8.0, thanks to >= symbol.
However, given tf-models-official, v2.8.0, its required packages are defined in tf-models-official-2.8.0\tf_models_official.egg-info\requires.txt (Note: download the package and extract it, using the link.)
Here, we find out:
tensorflow~=2.8.0
...meaning the required Tensorflow version is 2.8.*.
This may not be desired, e.g. in CoLab, currently, the version is 2.7.0.
To workaround, we should use tf-models-official, v2.7.0. Notice it matches the Tensorflow version. In this version 2.7.0's requires.txt, we should see tensorflow>=2.4.0, which is already satisfied by CoLab's default Tensorflow version (2.7.0).
To make this workaround possible, the \models\research\object_detection\packages\tf2\setup.py should be modified from, e.g. 'tf-models-official>=2.5.1' to 'tf-models-official==2.7.0'.
Caveat: I think this hack doesn't affect the functionality of the Object Detection API because it originally demands any tf-models-official >= 2.5.1. We just simply fix it to ==2.7.0 instead.

How can I convert the model I trained with Tensorflow (python) for use with TensorflowJS without involving IBM cloud (from the step I'm at now)?

What I'm trying to do
I'm trying to learn TensorFlow object recognition and as usual with new things, I scoured the web for tutorials. I don't want to involve any third party cloud service or web development framework, I want to learn to do it with just native JavaScript, Python, and the TensorFlow library.
What I have so far
So far, I've followed a TensorFlow object detection tutorial (accompanied by a 5+ hour video) to the point where I've trained a model in Tensorflow (python) and want to convert it to run in a browser via TensorflowJS. I've also tried other tutorials and haven't seemed to find one that explains how to do this without a third party cloud / tool and React.
I know in order to use this model with tensorflow.js my goal is to get files like:
group1-shard1of2.bin
group1-shard2of2.bin
labels.json
model.json
I've gotten to the point where I created my TFRecord files and started training:
py Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config --num_train_steps=100
It seems after training the model, I'm left with:
files named checkpoint, ckpt-1.data-00000-of-00001, ckpt-1.index, pipeline.config
the pre-trained model (which I believe isn't the file that changes during training, right?) ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8
I'm sure it's not hard to get from this step to the files I need, but I honestly browsed a lot of documentation and tutorials and google and didn't see an example of doing it without some third party cloud service. Maybe it's in the documentation, I'm missing something obvious.
The project directory structure looks like this:
Where I've looked for an answer
For some reason, frustratingly, every single tutorial I've found (including the one linked above) for using a pre-trained Tensorflow model for object detection via TensorFlowJS has required the use of IBM Cloud and ReactJS. Maybe they're all copying from some tutorial they found and now all the tutorials include this, I don't know. What I do know is I'm building an Electron.js desktop app and object detection shouldn't require network connectivity assuming the compute is happening on the user's device. To clarify: I'm creating an app where the user trains the model, so it's not just a matter of one time conversion. I want to be able to train with Python Tensorflow and convert the model to run on JavaScript Tensorflow without any cloud API.
So I stopped looking for tutorials and tried looking directly at the documentation at https://github.com/tensorflow/tfjs.
When you get to the section about importing pre-trained models, it says:
Importing pre-trained models
We support porting pre-trained models from:
TensorFlow SavedModel
Keras
So I followed that link to Tensorflow SavedModel, which brings us to a project called tfjs-converter. That repo says:
This repository has been archived in favor of tensorflow/tfjs.
This repo will remain around for some time to keep history but all
future PRs should be sent to tensorflow/tfjs inside the tfjs-core
folder.
All history and contributions have been preserved in the monorepo.
Which sounds a bit like a circular reference to me, considering it's directing me to the page that just told me to go here. So at this point you're wondering well is this whole library deprecated, will it work or what? I look around in this repo anyway, into: https://github.com/tensorflow/tfjs-converter/tree/master/tfjs-converter
It says:
A 2-step process to import your model:
A python pip package to convert a TensorFlow SavedModel or TensorFlow Hub module to a web friendly format. If you already have a converted model, or are using an already hosted model (e.g. MobileNet), skip this step.
JavaScript API, for loading and running inference.
And basically says to create a venv and do:
pip install tensorflowjs
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
/mobilenet/saved_model \
/mobilenet/web_model
But wait, are the checkpoint files I have a "TensorFlow SavedModel"? This doesn't seem clear, the documentation doesn't explain. So I google it, find the documentation, and it says:
You can save and load a model in the SavedModel format using the
following APIs:
Low-level tf.saved_model API. This document describes how to use this
API in detail. Save: tf.saved_model.save(model, path_to_dir)
The linked syntax extrapolates somewhat:
tf.saved_model.save(
obj, export_dir, signatures=None, options=None
)
with an example:
class Adder(tf.Module):
#tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.float32)])
def add(self, x):
return x + x
model = Adder()
tf.saved_model.save(model, '/tmp/adder')
But so far, this isn't familiar at all. I don't understand how to take the results of my training process so far (the checkpoints) to load it into a variable model so I can pass it to this function.
This passage seems important:
Variables must be tracked by assigning them to an attribute of a
tracked object or to an attribute of obj directly. TensorFlow objects
(e.g. layers from tf.keras.layers, optimizers from tf.train) track
their variables automatically. This is the same tracking scheme that
tf.train.Checkpoint uses, and an exported Checkpoint object may be
restored as a training checkpoint by pointing
tf.train.Checkpoint.restore to the SavedModel's "variables/"
subdirectory.
And it might be the answer, but I'm not really clear on what it means as far as being "restored", or where I go from there, if that's even the right step to take. All of this is very confusing to someone learning TF which is why I looked for a tutorial that does it, but again, I can't seem to find one without third party cloud services / React.
Please help me connect the dots.
You can convert your model to TensorFlowJS format without any cloud services. I have laid out the steps below.
I'm sure it's not hard to get from this step to the files I need.
The checkpoints you see are in tf.train.Checkpoint format (relevant source code that creates these checkpoints in the object detection model code). This is different from the SavedModel and Keras formats.
We will go through these steps:
Checkpoint (current) --> SavedModel --> TensorFlowJS
Converting from tf.train.Checkpoint to SavedModel
Please see the script models/research/object_detection/export_inference_graph.py to convert the Checkpoint files to SavedModel.
The code below is taken from the docs of that script. Please adjust the paths to your specific project. --input_type should remain as image_tensor.
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
In the output directory, you should see a savedmodel directory. We will use this in the next step.
Converting SavedModel to TensorFlowJS
Follow the instructions at https://github.com/tensorflow/tfjs/tree/master/tfjs-converter, specifically paying attention to the "TensorFlow SavedModel example". The example conversion code is copied below. Please modify the input and output paths for your project. The --signature_name and --saved_model_tags might have to be changed, but hopefully not.
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
/mobilenet/saved_model \
/mobilenet/web_model
Using the TensorFlowJS model
I know in order to use this model with tensorflow.js my goal is to get files like:
group1-shard1of2.bin
group1-shard2of2.bin
labels.json
model.json
The steps above should create these files for you, though I don't think labels.json will be created. I am not sure what that file should contain. TensorFlowJS will use model.json to construct the inference graph, and it will load the weights from the .bin files.
Because we converted a TensorFlow SavedModel to a TensorFlowJS model, we will need to load the JS model with tf.loadGraphModel(). See the tfjs converter page for more information.
Note that for TensorFlowJS, there is a difference between a TensorFlow SavedModel and a Keras SavedModel. Here, we are dealing with a TensorFlow SavedModel.
The Javascript code to run the model is probably out of scope for this answer, but I would recommend reading this TensorFlowJS tutorial. I have included a representative javascript portion below.
import * as tf from '#tensorflow/tfjs';
import {loadGraphModel} from '#tensorflow/tfjs-converter';
const MODEL_URL = 'model_directory/model.json';
const model = await loadGraphModel(MODEL_URL);
const cat = document.getElementById('cat');
model.execute(tf.browser.fromPixels(cat));
Extra notes
... Which sounds a bit like a circular reference to me,
The TensorFlowJS ecosystem has been consolidated in the tensorflow/tfjs GitHub repository. The tfjs-converter documentation lives there now. You can create a pull request to https://github.com/tensorflow/tfjs to fix the SavedModel link to point to the tensorflow/tfjs repository.

Does Tensorflow server serve/support non-tensorflow based libraries like scikit-learn?

Actually we are creating a platform to be able to put AI usecases in production. TFX is the first choice but what if we want to use non-tensorflow based libraries like scikit learn etc and want to include a python script to create models. Will output of such a model be served by tensorflow server. How can I make sure to be able to run both tensorflow based model and non-tensorflow based libraries and models in one system design. Please suggest.
Mentioned below is the procedure to Deploy and Serve a Sci-kit Learn Model in Google Cloud Platform.
First step is to Save/Export the SciKit Learn Model using the below code:
from sklearn.externals import joblib
joblib.dump(clf, 'model.joblib')
Next step is to upload the model.joblib file to Google Cloud Storage.
After that, we need to create our model and version, specifying that we are loading up a scikit-learn model, and select the runtime version of Cloud ML engine, as well as the version of Python that we used to export this model.
Next, we need to present the data to Cloud ML Engine as a simple array, encoded as a json file, like shown below. We can use JSON Library as well.
print(list(X_test.iloc[10:11].values))
Next, we need to run the below command to perform the Inference,
gcloud ml-engine predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_FILE
For more information, please refer this link.

Spacy v2 alpha - Can't find factory for 'TokenVectorEncoder'

I have trained a model, based off spacy.blank('en') and used nlp.to_disk to save it, however when I come to do spacy.load('path/to/model') I hit the error Can't find factory for 'TokenVectorEncoder'
Inside the model folder, there's a TokenVectorEncoder dir, and the meta.json file mentioned TokenVectorEncoder too.
Any ideas?
Are you loading the model with the same spaCy alpha version you used for training? It looks like this error might be related to a change in the very latest v2.0.0a17, which doesn't use the "tensorizer", i.e. TokenVectorEncoder in the model pipeline anymore. So the easiest solution would probably be to re-train your model with the latest version and then re-package and load it with the same version you used for training.