TensorFlow Serving vs. TensorFlow Inference (container type for SageMaker model) - tensorflow-serving

I am fairly new to TensorFlow (and SageMaker) and am stuck in the process of deploying a SageMaker endpoint. I have just recently succeeded in creating a Saved Model type model, which is currently being used to service a sample endpoint (the model was created externally). However, when I checked the image I am using for the endpoint, it says '.../tensorflow-inference', which is not the direction I want to go in because I want to use a SageMaker TensorFlow serving container (I followed tutorials from the official TensorFlow serving GitHub repo-using sample models, and they are deployed correcting using the TensorFlow serving framework).
Am I encountering this issue because my Saved Model does not have the correct 'serving' tag? I have not checked my tag sets yet but wanted to know if this would be the core reason to the problem. Also, most importantly, what are the differences between the two container types-I think having a better understanding of these two concepts would show me why I am unable to produce the correct image.
This is how I deployed the sample endpoint:
model = Model(model_data =...)
predictor = model.deploy(initial_instance_count=...)
When I run the code, I get a model, an endpoint configuration, and an endpoint. I got the container type by clicking on model details within the AWS SageMaker console.

There are two APIs for deploying TensorFlow models: tensorflow.Model and tensorflow.serving.Model. It isn't clear from the code-snippet which one you're using, but the SageMaker docs recommend the latter for deploying from pre-existing s3 artifacts:
from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')
Reference: https://github.com/aws/sagemaker-python-sdk/blob/c919e4dee3a00243f0b736af93fb156d17b04796/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#deploying-directly-from-model-artifacts
it says '.../tensorflow-inference', which is not the direction I want to go in because I want to use a SageMaker TensorFlow serving container
If you haven't specified an image argument for tensorflow.Model, SageMaker should be using the default TensorFlow serving image (seems like "../tensorflow-inference").
image (str) – A Docker image URI (default: None). If not specified, a default image for TensorFlow Serving will be used.
If all of this seems needlessly complex to you, I'm working on a platform that makes this set up a single line of code -- I'd love for you to try it, dm me at https://twitter.com/yoavz_.

There are different versions for the framework containers. Since the framework version I'm using is 1.15, the image I got had to be in a tensorflow-inference container. If I used versions <= 1.13, then I would get sagemaker-tensorflow-serving images. The two aren't the same, but there's no 'correct' container type.

Related

Serving "Frankenstein" (combined) models at scale

I have a tensorflow model that's combined with a clustering algorithm in (HDBSCAN). Both have been trained/fitted separately but they work together (tf -> hdbscan). I'm looking to serve predictions on GCP at scale.
Currently, I've created a custom serving container that stitches the models together in python, but you can imagine that this isn't very performant, especially since the tf model is loaded in eager mode. Are there canonical solutions to this problem?
An idea I have is to run the canonical tf server detached inside the container and have a outside facing server that intercepts request, passes it to the local tf server, then run the clustering algorithm on the tf server response, but I'm not sure how well this will work or if there's better ways.

How can I convert the model I trained with Tensorflow (python) for use with TensorflowJS without involving IBM cloud (from the step I'm at now)?

What I'm trying to do
I'm trying to learn TensorFlow object recognition and as usual with new things, I scoured the web for tutorials. I don't want to involve any third party cloud service or web development framework, I want to learn to do it with just native JavaScript, Python, and the TensorFlow library.
What I have so far
So far, I've followed a TensorFlow object detection tutorial (accompanied by a 5+ hour video) to the point where I've trained a model in Tensorflow (python) and want to convert it to run in a browser via TensorflowJS. I've also tried other tutorials and haven't seemed to find one that explains how to do this without a third party cloud / tool and React.
I know in order to use this model with tensorflow.js my goal is to get files like:
group1-shard1of2.bin
group1-shard2of2.bin
labels.json
model.json
I've gotten to the point where I created my TFRecord files and started training:
py Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config --num_train_steps=100
It seems after training the model, I'm left with:
files named checkpoint, ckpt-1.data-00000-of-00001, ckpt-1.index, pipeline.config
the pre-trained model (which I believe isn't the file that changes during training, right?) ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8
I'm sure it's not hard to get from this step to the files I need, but I honestly browsed a lot of documentation and tutorials and google and didn't see an example of doing it without some third party cloud service. Maybe it's in the documentation, I'm missing something obvious.
The project directory structure looks like this:
Where I've looked for an answer
For some reason, frustratingly, every single tutorial I've found (including the one linked above) for using a pre-trained Tensorflow model for object detection via TensorFlowJS has required the use of IBM Cloud and ReactJS. Maybe they're all copying from some tutorial they found and now all the tutorials include this, I don't know. What I do know is I'm building an Electron.js desktop app and object detection shouldn't require network connectivity assuming the compute is happening on the user's device. To clarify: I'm creating an app where the user trains the model, so it's not just a matter of one time conversion. I want to be able to train with Python Tensorflow and convert the model to run on JavaScript Tensorflow without any cloud API.
So I stopped looking for tutorials and tried looking directly at the documentation at https://github.com/tensorflow/tfjs.
When you get to the section about importing pre-trained models, it says:
Importing pre-trained models
We support porting pre-trained models from:
TensorFlow SavedModel
Keras
So I followed that link to Tensorflow SavedModel, which brings us to a project called tfjs-converter. That repo says:
This repository has been archived in favor of tensorflow/tfjs.
This repo will remain around for some time to keep history but all
future PRs should be sent to tensorflow/tfjs inside the tfjs-core
folder.
All history and contributions have been preserved in the monorepo.
Which sounds a bit like a circular reference to me, considering it's directing me to the page that just told me to go here. So at this point you're wondering well is this whole library deprecated, will it work or what? I look around in this repo anyway, into: https://github.com/tensorflow/tfjs-converter/tree/master/tfjs-converter
It says:
A 2-step process to import your model:
A python pip package to convert a TensorFlow SavedModel or TensorFlow Hub module to a web friendly format. If you already have a converted model, or are using an already hosted model (e.g. MobileNet), skip this step.
JavaScript API, for loading and running inference.
And basically says to create a venv and do:
pip install tensorflowjs
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
/mobilenet/saved_model \
/mobilenet/web_model
But wait, are the checkpoint files I have a "TensorFlow SavedModel"? This doesn't seem clear, the documentation doesn't explain. So I google it, find the documentation, and it says:
You can save and load a model in the SavedModel format using the
following APIs:
Low-level tf.saved_model API. This document describes how to use this
API in detail. Save: tf.saved_model.save(model, path_to_dir)
The linked syntax extrapolates somewhat:
tf.saved_model.save(
obj, export_dir, signatures=None, options=None
)
with an example:
class Adder(tf.Module):
#tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.float32)])
def add(self, x):
return x + x
model = Adder()
tf.saved_model.save(model, '/tmp/adder')
But so far, this isn't familiar at all. I don't understand how to take the results of my training process so far (the checkpoints) to load it into a variable model so I can pass it to this function.
This passage seems important:
Variables must be tracked by assigning them to an attribute of a
tracked object or to an attribute of obj directly. TensorFlow objects
(e.g. layers from tf.keras.layers, optimizers from tf.train) track
their variables automatically. This is the same tracking scheme that
tf.train.Checkpoint uses, and an exported Checkpoint object may be
restored as a training checkpoint by pointing
tf.train.Checkpoint.restore to the SavedModel's "variables/"
subdirectory.
And it might be the answer, but I'm not really clear on what it means as far as being "restored", or where I go from there, if that's even the right step to take. All of this is very confusing to someone learning TF which is why I looked for a tutorial that does it, but again, I can't seem to find one without third party cloud services / React.
Please help me connect the dots.
You can convert your model to TensorFlowJS format without any cloud services. I have laid out the steps below.
I'm sure it's not hard to get from this step to the files I need.
The checkpoints you see are in tf.train.Checkpoint format (relevant source code that creates these checkpoints in the object detection model code). This is different from the SavedModel and Keras formats.
We will go through these steps:
Checkpoint (current) --> SavedModel --> TensorFlowJS
Converting from tf.train.Checkpoint to SavedModel
Please see the script models/research/object_detection/export_inference_graph.py to convert the Checkpoint files to SavedModel.
The code below is taken from the docs of that script. Please adjust the paths to your specific project. --input_type should remain as image_tensor.
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
In the output directory, you should see a savedmodel directory. We will use this in the next step.
Converting SavedModel to TensorFlowJS
Follow the instructions at https://github.com/tensorflow/tfjs/tree/master/tfjs-converter, specifically paying attention to the "TensorFlow SavedModel example". The example conversion code is copied below. Please modify the input and output paths for your project. The --signature_name and --saved_model_tags might have to be changed, but hopefully not.
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
/mobilenet/saved_model \
/mobilenet/web_model
Using the TensorFlowJS model
I know in order to use this model with tensorflow.js my goal is to get files like:
group1-shard1of2.bin
group1-shard2of2.bin
labels.json
model.json
The steps above should create these files for you, though I don't think labels.json will be created. I am not sure what that file should contain. TensorFlowJS will use model.json to construct the inference graph, and it will load the weights from the .bin files.
Because we converted a TensorFlow SavedModel to a TensorFlowJS model, we will need to load the JS model with tf.loadGraphModel(). See the tfjs converter page for more information.
Note that for TensorFlowJS, there is a difference between a TensorFlow SavedModel and a Keras SavedModel. Here, we are dealing with a TensorFlow SavedModel.
The Javascript code to run the model is probably out of scope for this answer, but I would recommend reading this TensorFlowJS tutorial. I have included a representative javascript portion below.
import * as tf from '#tensorflow/tfjs';
import {loadGraphModel} from '#tensorflow/tfjs-converter';
const MODEL_URL = 'model_directory/model.json';
const model = await loadGraphModel(MODEL_URL);
const cat = document.getElementById('cat');
model.execute(tf.browser.fromPixels(cat));
Extra notes
... Which sounds a bit like a circular reference to me,
The TensorFlowJS ecosystem has been consolidated in the tensorflow/tfjs GitHub repository. The tfjs-converter documentation lives there now. You can create a pull request to https://github.com/tensorflow/tfjs to fix the SavedModel link to point to the tensorflow/tfjs repository.

Does Tensorflow server serve/support non-tensorflow based libraries like scikit-learn?

Actually we are creating a platform to be able to put AI usecases in production. TFX is the first choice but what if we want to use non-tensorflow based libraries like scikit learn etc and want to include a python script to create models. Will output of such a model be served by tensorflow server. How can I make sure to be able to run both tensorflow based model and non-tensorflow based libraries and models in one system design. Please suggest.
Mentioned below is the procedure to Deploy and Serve a Sci-kit Learn Model in Google Cloud Platform.
First step is to Save/Export the SciKit Learn Model using the below code:
from sklearn.externals import joblib
joblib.dump(clf, 'model.joblib')
Next step is to upload the model.joblib file to Google Cloud Storage.
After that, we need to create our model and version, specifying that we are loading up a scikit-learn model, and select the runtime version of Cloud ML engine, as well as the version of Python that we used to export this model.
Next, we need to present the data to Cloud ML Engine as a simple array, encoded as a json file, like shown below. We can use JSON Library as well.
print(list(X_test.iloc[10:11].values))
Next, we need to run the below command to perform the Inference,
gcloud ml-engine predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_FILE
For more information, please refer this link.

Is it possible to share a Tensorflow model among Gunicorn workers?

I need to put in production a Tensorflow model with a simple APIs endpoint. The model should be shared among processes/workers/threads in order to not waste too many resources in terms of memory.
I already tried with multiple gunicorn workers setting the --preload option and loading the model before the definition of the resources, but after a while I get timeout error. The model is not responding. I know that there is Tensorflow serving service available for this purpose, but the problem is about the fact that the model is not still available at deployment time and the ML pipeline is composed of many different components (the Tensorflow model is just one of these components). The end user (customer) is the one that trains/saves the model. (I am using docker) Thanks in advance
The issue is that the Tensorflow runtime and global state is not fork safe.
See this thread: https://github.com/tensorflow/tensorflow/issues/5448 or this one: https://github.com/tensorflow/tensorflow/issues/51832
You could attempt to load the model only in the workers using the post_fork and post_worker_init server hooks.

Using tensorflow hub with go

I want to use pre trained models in my go application. Especially the Inception-ResNet-v2 model.
This model seems to be only available via tensorflow hub (https://www.tensorflow.org/hub/).
However I could not find any documentation how to use tensorflow hub with the go language bindings for tensorflow.
How can I download and use these models in go?
So after a lot of work in the past few days I finally found a way.
At first I wanted to just use Python to do all the Tensorflow stuff and then provide the results via a rest service. However it turned out that the number of models provided by Tensorflow Hub is very small. This was a problem for me because I had to try out different models and compare them.
Thus I switched to using models from https://github.com/tensorflow/models. There are several tutorials how to export the data to .pb files. Those files can then be loaded in Go using gocv.
It requires a lot of work to convert the files, but in the end I think this is the best way to use Tensorflow models in go.