Tensorflow Serving, online predictions: How to build a signature_def that accepts 'image_bytes' as input tensor name? - tensorflow-serving

I have successfully trained a Keras model and used it for predictions on my local machine, now i want to deploy it using Tensorflow Serving. My model takes images as input and returns a mask prediction.
According to the documentation here my instances need to be formatted like this:
{'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
Now, the saved_model.pb file automatically saved by my Keras model has the following tensor names:
input_tensor = graph.get_tensor_by_name('input_image:0')
output_tensor = graph.get_tensor_by_name('conv2d_23/Sigmoid:0')
therefore i need to save a new saved_model.pb file with a different signature_def.
I tried the following (see here for reference), which works:
with tf.Session(graph=tf.Graph()) as sess:
tf.saved_model.loader.load(sess, ['serve'], 'path/to/saved/model/')
graph = tf.get_default_graph()
input_tensor = graph.get_tensor_by_name('input_image:0')
output_tensor = graph.get_tensor_by_name('conv2d_23/Sigmoid:0')
tensor_info_input = tf.saved_model.utils.build_tensor_info(input_tensor)
tensor_info_output = tf.saved_model.utils.build_tensor_info(output_tensor)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'image_bytes': tensor_info_input},
outputs={'output_bytes': tensor_info_output},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
builder = tf.saved_model.builder.SavedModelBuilder('path/to/saved/new_model/')
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict_images': prediction_signature, })
builder.save()
but when i deploy the model and request predictions to the AI platform, i get the following error:
RuntimeError: Prediction failed: Error processing input: Expected float32, got {'b64': 'Prm4OD7JyEg+paQkPrGwMD7BwEA'} of type 'dict' instead.
readapting the answer here, i also tried to rewrite
input_tensor = graph.get_tensor_by_name('input_image:0')
as
image_placeholder = tf.placeholder(tf.string, name='b64')
graph_input_def = graph.as_graph_def()
input_tensor, = tf.import_graph_def(
graph_input_def,
input_map={'b64:0': image_placeholder},
return_elements=['input_image:0'])
with the (wrong) understanding that this would add a layer on top of my input tensor with matching 'b64' name (as per documentation) that accepts a string and connects it the original input tensor
but the error from the AI platform is the same.
(the relevant code i use for requesting a prediction is:
instances = [{'image_bytes': {'b64': base64.b64encode(image).decode()}}]
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
where image is a numpy.ndarray of dtype('float32'))
I feel i'm close enough but i'm definitely missing something. Can you please help?

After b64 encoded -> decoded, the buffer of img will be changed to type string and not fit your model input type.
You may try to add some preprocess in your model and send b64 request again.

Related

RoBERTa example from tfhub produces error "During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string"

I would like to use the roberta-base model from tfhub. I am trying to run the example below, although I get an error when I try to feed sentences to model as input. I get the following error Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string. I am using python 3.7, tensorflow 2.5, and tensorflow_hub 0.12.
If I try to replace preprocessor and encoder with the corresponding BERT versions, the code above works. However, I would like it to work for RoBERTa as well (as shown below).
preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4", trainable=True)
# define a text embedding model
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer("https://tfhub.dev/jeongukjae/roberta_en_cased_preprocess/1")
encoder_inputs = preprocessor(text_input)
encoder = hub.KerasLayer("https://tfhub.dev/jeongukjae/roberta_en_cased_L-12_H-768_A-12/1", trainable=True)
encoder_outputs = encoder(encoder_inputs)
pooled_output = encoder_outputs["pooled_output"] # [batch_size, 768].
sequence_output = encoder_outputs["sequence_output"] # [batch_size, seq_length, 768].
model = tf.keras.Model(text_input, pooled_output)
# You can embed your sentences as follows
sentences = tf.constant(["(your text here)"])
print(model(sentences))
Additionally, the code above with the RoBERTa preprocessor/encoder seems to work if I use CPU instead of GPU (adding with tf.device('/cpu:0')), but this is not feasible because I need to fine-tune a model on lots of data.

Tensorflow Model to TFLITE

I have this code for building a semantic search engine using pre-trained universal encoder from tensorflow hub. I am not able to convert to tlite. I have saved the model to my directory.
Importing the model:
module_path ="/content/drive/My Drive/4"
%time model = hub.load(module_path)
#print ("module %s loaded" % module_url)
#Create function for using modeltraining
def embed(input):
return model(input)
Training the model on data:
## training the model
Model_USE= embed(data)
Saving the model:
exported = tf.train.Checkpoint(v=tf.Variable(Model_USE))
exported.f = tf.function(
lambda x: exported.v * x,
input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
export_dir = "/content/drive/My Drive/"
tf.saved_model.save(exported,export_dir)
Saving works fine but when I convert to tflite it gives error.
Conversion code:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
Error:
as_list() is not defined on an unknown TensorShape.
First, you should need to add a data generator to have representative inputs for the converter. Just like this:
def representative_data_gen():
for input_value in dataset.take(100):
yield [input_value]
The input value must be of shape (1, your_iput_shape) as if it had batch shape of 1. It has to be yielded as a list; mandatory.
You should also declare which type of optimization do you want, for example:
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
Nevertheless, I have also encountered problems with the different options of the converter depending on the network structure, which in this case I do not know. So, to make a clean run of the converter I would just do:
converter = lite.TFLiteConverter.from_keras_model(model)
converter.experimental_new_converter = True
converter.optimizations = [lite.Optimize.DEFAULT]
tfmodel = converter.convert()
The converter.experimental_new_converter = True is for problems when converting RNNs as in https://github.com/tensorflow/tensorflow/issues/34813
EDIT:
As explained here: ValueError: None is only supported in the 1st dimension. Tensor 'flatbuffer_data' has invalid shape '[None, None, 1, 512]' TFLite only allows the first dimension of your data to be None, that is, the batch. All other dimensions must be fixed. Try padding them with, for example, tf.keras.preprocessing.sequence.pad_sequences.
Then mask your sequences in the network as described in: tensorflow.org/guide/keras/masking_and_padding with Embedding or Masking layers.

Deploying a TensorFlow model on Google Cloud that receives a base64 encoded string as a model input

I have successfully setup Google Cloud and deployed a pre-trained ML model that takes an input tensor (image) of shape=(?, 224, 224, 3) and dtype=float32. It works well but this is inefficient when making REST requests and should really use a base64 encoded string. The challenge is that I am using transfer learning and cannot control the input of the original pre-trained model. To get around this with adding additional infrastructure I created a small graph (wrapper) that handles the base64 to array conversion and connected it to my pre-trained model graph yielding a new single graph. The small graph takes an input tensor with the shape=(), dtype=string and return a tensor with the shape=(224, 224, 3), dtype=float32 which can then be passed to the original model. The model compiles to .pb file without errors and successfully deploys but I get the following error when making my Post request:
{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Index out of range using input dim 0; input has only 0 dims\n\t [[{{node lambda/map/while/strided_slice}}]]")'}
Post request body:
{'instances': [{'b64': 'iVBORw0KGgoAAAANSUhEUgAAAOAA...'}]}`
This error leads me to believe the post request is incorrectly formatted for handling the base64 string or my base conversion graph input is setup incorrectly. I can run the code locally by calling predict on my combined model and pass it a tensor in the form of shape=(), dtype=string constructed locally and get a result successfully.
Here is my code for combining the 2 graphs:
import tensorflow as tf
# Local dependencies
from myProject.classifier_models import mobilenet
from myProject.dataset_loader import dataset_loader
from myProject.utils import f1_m, recall_m, precision_m
with tf.keras.backend.get_session() as sess:
def preprocess_and_decode(img_str, new_shape=[224,224]):
#img = tf.io.decode_base64(img_str)
img = tf.image.decode_png(img_str, channels=3)
img = (tf.cast(img, tf.float32)/127.5) - 1
img = tf.image.resize_images(img, new_shape, method=tf.image.ResizeMethod.AREA, align_corners=False)
# If you need to squeeze your input range to [0,1] or [-1,1] do it here
return img
InputLayer = tf.keras.layers.Input(shape = (1,),dtype="string")
OutputLayer = tf.keras.layers.Lambda(lambda img : tf.map_fn(lambda im : preprocess_and_decode(im[0]), img, dtype="float32"))(InputLayer)
base64_model = tf.keras.Model(InputLayer,OutputLayer)
tf.keras.backend.set_learning_phase(0) # Ignore dropout at inference
transfer_model = tf.keras.models.load_model('./trained_model/mobilenet_93.h5', custom_objects={'f1_m': f1_m, 'recall_m': recall_m, 'precision_m': precision_m})
sess.run(tf.global_variables_initializer())
base64_input = base64_model.input
final_output = transfer_model(base64_model.output)
new_model = tf.keras.Model(base64_input,final_output)
export_path = '../myModels/001'
tf.saved_model.simple_save(
sess,
export_path,
inputs={'input_class': new_model.input},
outputs={'output_class': new_model.output})
Tech: TensorFlow 1.13.1 & Python 3.5
I have looked at a bunch of related posts such as:
https://stackoverflow.com/a/50606625
https://stackoverflow.com/a/42859733
http://www.voidcn.com/article/p-okpgbnul-bvs.html (right-click translate to english)
https://cloud.google.com/ml-engine/docs/tensorflow/online-predict
Any suggestions or feedback would be greatly appreciated!
Update 06/12/2019:
Inspecting the 3 graph summaries everything appears correctly merged
Update 06/14/2019:
Ended up going with this alternative strategy instead, implementing a tf.estimator

using Estimator interface for inference with pre-trained tensorflow object detection model

I'm trying to load a pre-trained tensorflow object detection model from the Tensorflow Object Detection repo as a tf.estimator.Estimator and use it to make predictions.
I'm able to load the model and run inference using Estimator.predict(), however the output is garbage. Other methods of loading the model, e.g. as a Predictor, and running inference work fine.
Any help properly loading a model as an Estimator calling predict() would be much appreciated. My current code:
Load and prepare image
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(list(image.getdata())).reshape((im_height, im_width, 3)).astype(np.uint8)
image_url = 'https://i.imgur.com/rRHusZq.jpg'
# Load image
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Format original image size
im_size_orig = np.array(list(image.size) + [1])
im_size_orig = np.expand_dims(im_size_orig, axis=0)
im_size_orig = np.int32(im_size_orig)
# Resize image
image = image.resize((np.array(image.size) / 4).astype(int))
# Format image
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
image_np_expanded = np.float32(image_np_expanded)
# Stick into feature dict
x = {'image': image_np_expanded, 'true_image_shape': im_size_orig}
# Stick into input function
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x=x,
y=None,
shuffle=False,
batch_size=128,
queue_capacity=1000,
num_epochs=1,
num_threads=1,
)
Side note:
train_and_eval_dict also seems to contain an input_fn for prediction
train_and_eval_dict['predict_input_fn']
However this actually returns a tf.estimator.export.ServingInputReceiver, which I'm not sure what to do with. This could potentially be the source of my problems as there's a fair bit of pre-processing involved before the model actually sees the image.
Load model as Estimator
Model downloaded from TF Model Zoo here, code to load model adapted from here.
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/'
pipeline_config_path = os.path.join(model_dir, 'pipeline.config')
config = tf.estimator.RunConfig(model_dir=model_dir)
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config=config,
hparams=model_hparams.create_hparams(None),
pipeline_config_path=pipeline_config_path,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=(5))
estimator = train_and_eval_dict['estimator']
Run inference
output_dict1 = estimator.predict(predict_input_fn)
This prints out some log messages, one of which is:
INFO:tensorflow:Restoring parameters from ./pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt
So it seems like pre-trained weights are getting loaded. However results look like:
Load same model as a Predictor
from tensorflow.contrib import predictor
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28'
saved_model_dir = os.path.join(model_dir, 'saved_model')
predict_fn = predictor.from_saved_model(saved_model_dir)
Run inference
output_dict2 = predict_fn({'inputs': image_np_expanded})
Results look good:
When you load the model as an estimator and from a checkpoint file, here is the restore function associated with ssd models. From ssd_meta_arch.py
def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type='detection'`). If False, only variables
within the appropriate scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
As you can see even if the config file sets from_detection_checkpoint: True, only the variables in the feature extractor scope will be restored. To restore all the variables, you will have to set
load_all_detection_checkpoint_vars: True
in the config file.
So, the above situation is quite clear. When load the model as an Estimator, only the variables from feature extractor scope will be restored, and the predictors's scope weights are not restored, the estimator would obviously give random predictions.
When load the model as a predictor, all weights are loaded thus the predictions are reasonable.

Error when call prediction with base 64 input

I am using Tensorflow hub's example to export a saved_model to be serve with Tensorflow serving using Docker. (https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py)
I just followed some instruction on the internet and modified the export_model like below
def export_model(module_spec, class_count, saved_model_dir):
"""Exports model for serving.
Args:
module_spec: The hub.ModuleSpec for the image module being used.
class_count: The number of classes.
saved_model_dir: Directory in which to save exported model and variables.
"""
# The SavedModel should hold the eval graph.
sess, in_image, _, _, _, _ = build_eval_session(module_spec, class_count)
# Shape of [None] means we can have a batch of images.
image = tf.placeholder(shape=[None], dtype=tf.string)
with sess.graph.as_default() as graph:
tf.saved_model.simple_save(
sess,
saved_model_dir,
#inputs={'image': in_image},
inputs = {'image_bytes': image},
outputs={'prediction': graph.get_tensor_by_name('final_result:0')},
legacy_init_op=tf.group(tf.tables_initializer(), name='legacy_init_op')
)
The problem is when i try to call the api using postman it came with this error
{
"error": "Tensor Placeholder_1:0, specified in either feed_devices or fetch_devices was not found in the Graph"
}
Do I need to modify the retraining process so it can accept base64 input?