How to pass a *serialized* tensor to a TensorFlow Model Serving server? - tensorflow

I implemented a simple TF model. The model received a serialized tensor of a gray image (simply a 2d ndarray), and restored it to a 2-d tensor. After then, some inference is applied on this 2-d tensor.
I deployed the mode with TensorFlow Model Serving, and tried to send a JSON string to the REST port as follows:
{
"instances": [
{"b64": bin_str},
]
}
I tried something like tf.io.serialize_tensor etc to convert input image into a serialized tensor and to pass it to the server, but all failed.
I would like to know how to send a serialized tensor to the serving server.
And my saved model has following signature:
signatures = {
"serving_default": _get_serve_tf_examples_fn(
model,
transform_output).get_concrete_function(
# explicitly specify input signature of serve_tf_examples_fn
tf.TensorSpec(
shape=[None],
dtype=tf.string,
name="examples")),
}
and the definition of _get_serve_tf_examples_fn is,
def _get_serve_tf_examples_fn(model: tf.keras.models.Model,
transform_output: tft.TFTransformOutput):
# get the Transform graph from the component
model.tft_layer = transform_output.transform_features_layer()
#tf.function
def serve_tf_examples_fn(serialized: str) -> Dict:
''' Args: serialized: is serialized image tensor.
'''
feature_spec = transform_output.raw_feature_spec()
# remove label spec.
feature_spec.pop("label")
# Deserialize the image tensor.
parsed_features = tf.io.parse_example(
serialized,
feature_spec)
# Preprocess the example using outputs of Transform pipeline.
transformed_features = model.tft_layer(parsed_features)
outputs = model(transformed_features)
return {"outputs": outputs}
return serve_tf_examples_fn
The above code segment received a serialized tensor of a gray image (simply a 2d ndarray), and restored it to a 2-d tensor. After then, the model is doing inference on this 2-d tensor.
I would like to know how to send a serialized tensor to the REST port of the serving server.
Any help would be appreciated.

Related

Deploying a TensorFlow model on Google Cloud that receives a base64 encoded string as a model input

I have successfully setup Google Cloud and deployed a pre-trained ML model that takes an input tensor (image) of shape=(?, 224, 224, 3) and dtype=float32. It works well but this is inefficient when making REST requests and should really use a base64 encoded string. The challenge is that I am using transfer learning and cannot control the input of the original pre-trained model. To get around this with adding additional infrastructure I created a small graph (wrapper) that handles the base64 to array conversion and connected it to my pre-trained model graph yielding a new single graph. The small graph takes an input tensor with the shape=(), dtype=string and return a tensor with the shape=(224, 224, 3), dtype=float32 which can then be passed to the original model. The model compiles to .pb file without errors and successfully deploys but I get the following error when making my Post request:
{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Index out of range using input dim 0; input has only 0 dims\n\t [[{{node lambda/map/while/strided_slice}}]]")'}
Post request body:
{'instances': [{'b64': 'iVBORw0KGgoAAAANSUhEUgAAAOAA...'}]}`
This error leads me to believe the post request is incorrectly formatted for handling the base64 string or my base conversion graph input is setup incorrectly. I can run the code locally by calling predict on my combined model and pass it a tensor in the form of shape=(), dtype=string constructed locally and get a result successfully.
Here is my code for combining the 2 graphs:
import tensorflow as tf
# Local dependencies
from myProject.classifier_models import mobilenet
from myProject.dataset_loader import dataset_loader
from myProject.utils import f1_m, recall_m, precision_m
with tf.keras.backend.get_session() as sess:
def preprocess_and_decode(img_str, new_shape=[224,224]):
#img = tf.io.decode_base64(img_str)
img = tf.image.decode_png(img_str, channels=3)
img = (tf.cast(img, tf.float32)/127.5) - 1
img = tf.image.resize_images(img, new_shape, method=tf.image.ResizeMethod.AREA, align_corners=False)
# If you need to squeeze your input range to [0,1] or [-1,1] do it here
return img
InputLayer = tf.keras.layers.Input(shape = (1,),dtype="string")
OutputLayer = tf.keras.layers.Lambda(lambda img : tf.map_fn(lambda im : preprocess_and_decode(im[0]), img, dtype="float32"))(InputLayer)
base64_model = tf.keras.Model(InputLayer,OutputLayer)
tf.keras.backend.set_learning_phase(0) # Ignore dropout at inference
transfer_model = tf.keras.models.load_model('./trained_model/mobilenet_93.h5', custom_objects={'f1_m': f1_m, 'recall_m': recall_m, 'precision_m': precision_m})
sess.run(tf.global_variables_initializer())
base64_input = base64_model.input
final_output = transfer_model(base64_model.output)
new_model = tf.keras.Model(base64_input,final_output)
export_path = '../myModels/001'
tf.saved_model.simple_save(
sess,
export_path,
inputs={'input_class': new_model.input},
outputs={'output_class': new_model.output})
Tech: TensorFlow 1.13.1 & Python 3.5
I have looked at a bunch of related posts such as:
https://stackoverflow.com/a/50606625
https://stackoverflow.com/a/42859733
http://www.voidcn.com/article/p-okpgbnul-bvs.html (right-click translate to english)
https://cloud.google.com/ml-engine/docs/tensorflow/online-predict
Any suggestions or feedback would be greatly appreciated!
Update 06/12/2019:
Inspecting the 3 graph summaries everything appears correctly merged
Update 06/14/2019:
Ended up going with this alternative strategy instead, implementing a tf.estimator

Tensorflow Serving, online predictions: How to build a signature_def that accepts 'image_bytes' as input tensor name?

I have successfully trained a Keras model and used it for predictions on my local machine, now i want to deploy it using Tensorflow Serving. My model takes images as input and returns a mask prediction.
According to the documentation here my instances need to be formatted like this:
{'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
Now, the saved_model.pb file automatically saved by my Keras model has the following tensor names:
input_tensor = graph.get_tensor_by_name('input_image:0')
output_tensor = graph.get_tensor_by_name('conv2d_23/Sigmoid:0')
therefore i need to save a new saved_model.pb file with a different signature_def.
I tried the following (see here for reference), which works:
with tf.Session(graph=tf.Graph()) as sess:
tf.saved_model.loader.load(sess, ['serve'], 'path/to/saved/model/')
graph = tf.get_default_graph()
input_tensor = graph.get_tensor_by_name('input_image:0')
output_tensor = graph.get_tensor_by_name('conv2d_23/Sigmoid:0')
tensor_info_input = tf.saved_model.utils.build_tensor_info(input_tensor)
tensor_info_output = tf.saved_model.utils.build_tensor_info(output_tensor)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'image_bytes': tensor_info_input},
outputs={'output_bytes': tensor_info_output},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
builder = tf.saved_model.builder.SavedModelBuilder('path/to/saved/new_model/')
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict_images': prediction_signature, })
builder.save()
but when i deploy the model and request predictions to the AI platform, i get the following error:
RuntimeError: Prediction failed: Error processing input: Expected float32, got {'b64': 'Prm4OD7JyEg+paQkPrGwMD7BwEA'} of type 'dict' instead.
readapting the answer here, i also tried to rewrite
input_tensor = graph.get_tensor_by_name('input_image:0')
as
image_placeholder = tf.placeholder(tf.string, name='b64')
graph_input_def = graph.as_graph_def()
input_tensor, = tf.import_graph_def(
graph_input_def,
input_map={'b64:0': image_placeholder},
return_elements=['input_image:0'])
with the (wrong) understanding that this would add a layer on top of my input tensor with matching 'b64' name (as per documentation) that accepts a string and connects it the original input tensor
but the error from the AI platform is the same.
(the relevant code i use for requesting a prediction is:
instances = [{'image_bytes': {'b64': base64.b64encode(image).decode()}}]
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
where image is a numpy.ndarray of dtype('float32'))
I feel i'm close enough but i'm definitely missing something. Can you please help?
After b64 encoded -> decoded, the buffer of img will be changed to type string and not fit your model input type.
You may try to add some preprocess in your model and send b64 request again.

using Estimator interface for inference with pre-trained tensorflow object detection model

I'm trying to load a pre-trained tensorflow object detection model from the Tensorflow Object Detection repo as a tf.estimator.Estimator and use it to make predictions.
I'm able to load the model and run inference using Estimator.predict(), however the output is garbage. Other methods of loading the model, e.g. as a Predictor, and running inference work fine.
Any help properly loading a model as an Estimator calling predict() would be much appreciated. My current code:
Load and prepare image
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(list(image.getdata())).reshape((im_height, im_width, 3)).astype(np.uint8)
image_url = 'https://i.imgur.com/rRHusZq.jpg'
# Load image
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Format original image size
im_size_orig = np.array(list(image.size) + [1])
im_size_orig = np.expand_dims(im_size_orig, axis=0)
im_size_orig = np.int32(im_size_orig)
# Resize image
image = image.resize((np.array(image.size) / 4).astype(int))
# Format image
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
image_np_expanded = np.float32(image_np_expanded)
# Stick into feature dict
x = {'image': image_np_expanded, 'true_image_shape': im_size_orig}
# Stick into input function
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x=x,
y=None,
shuffle=False,
batch_size=128,
queue_capacity=1000,
num_epochs=1,
num_threads=1,
)
Side note:
train_and_eval_dict also seems to contain an input_fn for prediction
train_and_eval_dict['predict_input_fn']
However this actually returns a tf.estimator.export.ServingInputReceiver, which I'm not sure what to do with. This could potentially be the source of my problems as there's a fair bit of pre-processing involved before the model actually sees the image.
Load model as Estimator
Model downloaded from TF Model Zoo here, code to load model adapted from here.
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/'
pipeline_config_path = os.path.join(model_dir, 'pipeline.config')
config = tf.estimator.RunConfig(model_dir=model_dir)
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config=config,
hparams=model_hparams.create_hparams(None),
pipeline_config_path=pipeline_config_path,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=(5))
estimator = train_and_eval_dict['estimator']
Run inference
output_dict1 = estimator.predict(predict_input_fn)
This prints out some log messages, one of which is:
INFO:tensorflow:Restoring parameters from ./pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt
So it seems like pre-trained weights are getting loaded. However results look like:
Load same model as a Predictor
from tensorflow.contrib import predictor
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28'
saved_model_dir = os.path.join(model_dir, 'saved_model')
predict_fn = predictor.from_saved_model(saved_model_dir)
Run inference
output_dict2 = predict_fn({'inputs': image_np_expanded})
Results look good:
When you load the model as an estimator and from a checkpoint file, here is the restore function associated with ssd models. From ssd_meta_arch.py
def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type='detection'`). If False, only variables
within the appropriate scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
As you can see even if the config file sets from_detection_checkpoint: True, only the variables in the feature extractor scope will be restored. To restore all the variables, you will have to set
load_all_detection_checkpoint_vars: True
in the config file.
So, the above situation is quite clear. When load the model as an Estimator, only the variables from feature extractor scope will be restored, and the predictors's scope weights are not restored, the estimator would obviously give random predictions.
When load the model as a predictor, all weights are loaded thus the predictions are reasonable.

Error when call prediction with base 64 input

I am using Tensorflow hub's example to export a saved_model to be serve with Tensorflow serving using Docker. (https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py)
I just followed some instruction on the internet and modified the export_model like below
def export_model(module_spec, class_count, saved_model_dir):
"""Exports model for serving.
Args:
module_spec: The hub.ModuleSpec for the image module being used.
class_count: The number of classes.
saved_model_dir: Directory in which to save exported model and variables.
"""
# The SavedModel should hold the eval graph.
sess, in_image, _, _, _, _ = build_eval_session(module_spec, class_count)
# Shape of [None] means we can have a batch of images.
image = tf.placeholder(shape=[None], dtype=tf.string)
with sess.graph.as_default() as graph:
tf.saved_model.simple_save(
sess,
saved_model_dir,
#inputs={'image': in_image},
inputs = {'image_bytes': image},
outputs={'prediction': graph.get_tensor_by_name('final_result:0')},
legacy_init_op=tf.group(tf.tables_initializer(), name='legacy_init_op')
)
The problem is when i try to call the api using postman it came with this error
{
"error": "Tensor Placeholder_1:0, specified in either feed_devices or fetch_devices was not found in the Graph"
}
Do I need to modify the retraining process so it can accept base64 input?

Tensorflow ServingInputReceiver input shape error in client

I'm currently working with tensorflow Estimator API and have problems with the confusing serving options that are available. My confusion comes from the very undetailed tensorflow documentation.
This is my goal:
Use tensorflow-serving prediction_service_pb2 by sending a serialized proto message as string to the ServingInputReceiver function of my exported Estimator model. I expect the ServingInputReceiver function to receive the serialized proto string on the "input" tensor which then will deserialize it to the features "ink" (=varlength float array) and "shape" (=fixedlength int64).
This is my (implementation of google quickdraw model) estimator Input function:
def _parse_tfexample_fn(example_proto, mode):
"""Parse a single record which is expected to be a tensorflow.Example."""
feature_to_type = {
"ink": tf.VarLenFeature(dtype=tf.float32),
"shape": tf.FixedLenFeature([2], dtype=tf.int64)
}
if mode != tf.estimator.ModeKeys.PREDICT:
# The labels won't be available at inference time, so don't add them
# to the list of feature_columns to be read.
feature_to_type["class_index"] = tf.FixedLenFeature([1], dtype=tf.int64)
parsed_features = tf.parse_single_example(example_proto, feature_to_type)
parsed_features["ink"] = tf.sparse_tensor_to_dense(parsed_features["ink"])
if mode != tf.estimator.ModeKeys.PREDICT:
labels = parsed_features["class_index"]
return parsed_features, labels
else:
return parsed_features # In prediction, we have no labels
This is my Serving Input Function:
def serving_input_receiver_fn():
"""An input receiver that expects a serialized tf.Example."""
feature_to_type = {"ink": tf.VarLenFeature(dtype=tf.float32), "shape": tf.FixedLenFeature([2], dtype=tf.int64)}
serialized_tf_example = tf.placeholder(dtype=tf.string, shape=[None], name='input')
parsed_features = tf.parse_example(serialized_tf_example, feature_to_type)
parsed_features["ink"] = tf.sparse_tensor_to_dense(parsed_features["ink"])
return tf.estimator.export.ServingInputReceiver(parsed_features, serialized_tf_example)
This is my client.py request:
features = {}
features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
f = tf.train.Features(feature=features)
data = tf.train.Example(features=f)
serialized=data.SerializeToString() # tensor to byte string
request.inputs['input'].ParseFromString(tf.contrib.util.make_tensor_proto(serialized, shape=[1], verify_shape=True))
And this is the error I get after calling the Predict function in client.py
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="input tensor alias not found in signature: ink. Inputs expected to be in the set {input}.")
I tried the following Servingfunctions:
ServingInputReceiver and build_raw_serving_input_receiver_fn give me the same grpc error. When I use build_parsing_serving_input_receiver_fn it wont even export my model. I tried to wrap my head around the documentation but it is very undetailed and I don't understand when to use which serving input function.