ML engine serving seems to not be working as intended - tensorflow

While using the following code and doing a gcloud ml-engine local predict I get:
InvalidArgumentError (see above for traceback): You must feed a value
for placeholder tensor 'Placeholder' with dtype string and shape [?]
[[Node: Placeholder = Placeholderdtype=DT_STRING, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] (Error code: 2)
tf_files_path = './tf'
# os.makedirs(tf_files_path) # temp dir
estimator =\
tf.keras.estimator.model_to_estimator(keras_model_path="model_data/yolo.h5",
model_dir=tf_files_path)
#up_one_dir(os.path.join(tf_files_path, 'keras'))
def serving_input_receiver_fn():
def prepare_image(image_str_tensor):
image = tf.image.decode_jpeg(image_str_tensor,
channels=3)
image = tf.divide(image, 255)
image = tf.image.convert_image_dtype(image, tf.float32)
return image
# Ensure model is batchable
# https://stackoverflow.com/questions/52303403/
input_ph = tf.placeholder(tf.string, shape=[None])
images_tensor = tf.map_fn(
prepare_image, input_ph, back_prop=False, dtype=tf.float32)
return tf.estimator.export.ServingInputReceiver(
{model.input_names[0]: images_tensor},
{'image_bytes': input_ph})
export_path = './export'
estimator.export_savedmodel(
export_path,
serving_input_receiver_fn=serving_input_receiver_fn)
The json I am sending to the ml engine looks like this:
{"image_bytes": {"b64": "/9j/4AAQSkZJRgABAQAAAQABAAD/2w..."}}
When not doing a local prediction, but sending it to ML engine itself, I get:
ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {
"error": {
"code": 500,
"message": "Internal error encountered.",
"status": "INTERNAL"
}
}
The saved_model_cli gives:
saved_model_cli show --all --dir export/1547848897/
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image_bytes'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['conv2d_59'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_59/BiasAdd:0
outputs['conv2d_67'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_67/BiasAdd:0
outputs['conv2d_75'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_75/BiasAdd:0
Method name is: tensorflow/serving/predict
Does anyone see what is going wrong here?

The issue has been resolved. The output of the model appeared to be too big for ml-engine to send it back and it didn't capture it in a more relevant exception than 500 internal error. We added some post-processing steps in the model and it works fine now.
For the gcloud ml-engine local predict command that is returning an error, it seems to be a bug. As the model works on ml-engine now, but still does return this error with local prediction.

Related

tensorflow serving request error, {The second input must be a scalar, but it has shape [1] }

I am new to tensorflow serving, here's the data I post:
data_0 = {'inputs/input_x': mfcc[0], "inputs/is_training":False, "inputs/keep_prob":1}
data = json.dumps({"signature_name":'serving_default', 'instances':[data_0]})
headers = {"Content-type": "application/json"}
json_response = requests.post(URL, data=data, headers=headers)
I got an error which comes from the is_training flag of batch normalization (and I think the same for the dropout rate):
{ "error": "The second input must be a scalar, but it has shape [1]\n\t [[{{node conv_layer2/conv2/batch_normalization/cond/Switch}}]]" }
Then I saw a similar issue and modified my code into
data_0 = {'inputs/input_x': mfcc[0], "inputs/is_training":[False], "inputs/keep_prob":[1]}
Then I got one more dimension:
{ "error": "The second input must be a scalar, but it has shape [1,1]\n\t [[{{node conv_layer2/conv2/batch_normalization/cond/Switch}}]]" }
And I tried to post without [] like :
data = json.dumps({"signature_name":'serving_default', 'instances':data_0})
I got :
"error": "JSON Value:{...} Excepting \'instances\' to be an list/array" }
Informations of my model:
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs/input_x'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 34, 20)
name: inputs/input_x:0
inputs['inputs/is_training'] tensor_info:
dtype: DT_BOOL
shape: unknown_rank
name: inputs/is_training:0
inputs['inputs/keep_prob'] tensor_info:
dtype: DT_FLOAT
shape: unknown_rank
name: inputs/keep_prob:0
The given SavedModel SignatureDef contains the following output(s):
outputs['Softmax'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: Softmax:0
Method name is: tensorflow/serving/predict
Need some help here, thanks!
For now I haven't got an answer and I re-trained my model without is_training flag, and that works.
So it's maybe that there is a problem about boolean values.
When using instances as input key, all inputs should have same 0-th dimension. Try inputs instead of instances. For example,
data = json.dumps({"signature_name":'serving_default', 'inputs':data_0})

Incompatible shapes: [11,768] vs. [1,5,768] - Inference in production with a huggingface saved model

I have saved a pre-trained version of distilbert, distilbert-base-uncased-finetuned-sst-2-english, from huggingface models, and i am attempting to serve it via Tensorflow Serve and make predictions. All is being tested currently in Colab at the moment.
I am having issue getting the prediction into the correct format for the model via TensorFlow Serve. Tensorflow services are up and running fine serving the model, however my prediction code is not correct and i need some help understanding how to make a prediction via json over the API.
# tokenize and encode a simple positive instance
instances = tokenizer.tokenize('this is the best day of my life!')
instances = tokenizer.encode(instances)
data = json.dumps({"signature_name": "serving_default", "instances": instances, })
print(data)
{"signature_name": "serving_default", "instances": [101, 2023, 2003, 1996, 2190, 2154, 1997, 2026, 2166, 999, 102]}
# setup json_response object
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)
predictions
{'error': '{{function_node __inference__wrapped_model_52602}} {{function_node __inference__wrapped_model_52602}} Incompatible shapes: [11,768] vs. [1,5,768]\n\t [[{{node tf_distil_bert_for_sequence_classification_3/distilbert/embeddings/add}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall]]'}
Any direction here would be appreciated.
Was able to find the solution by setting signatures for input shape and attention mask, which is the following below. This is a simple implementation that uses a fixed input shape for a saved model and requires you to pad the inputs to the expected input shape of 384. I have seen implementations of calling custom signatures and model creation to match expected input shapes, however the below simple case worked for what I was looking to accomplish with serving a huggingface model via TF Serve. If anyone has any better examples or ways to extend this functionality better, please post for future use.
# create callable
from transformers import TFDistilBertForQuestionAnswering
distilbert = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')
callable = tf.function(distilbert.call)
By calling get_concrete_function, we trace-compile the TensorFlow operations of the model for an input signature composed of two Tensors of shape [None, 384], the first one being the input ids and the second one the attention mask.
concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"), tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])
save the model with the signatures:
# stored model path for TF Serve (1 = version 1) --> '/path/to/my/model/distilbert_qa/1/'
distilbert_qa_save_path = 'path_to_model'
tf.saved_model.save(distilbert, distilbert_qa_save_path, signatures=concrete_function)
check to see that it contains the correct signature:
saved_model_cli show --dir 'path_to_model' --tag_set serve --signature_def serving_default
output should look like:
The given SavedModel SignatureDef contains the following input(s):
inputs['attention_mask'] tensor_info:
dtype: DT_INT32
shape: (-1, 384)
name: serving_default_attention_mask:0
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 384)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 384)
name: StatefulPartitionedCall:0
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 384)
name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict
TEST MODEL:
from transformers import DistilBertTokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased')
question, text = "Who was Benjamin?", "Benjamin was a silly dog."
input_dict = tokenizer(question, text, return_tensors='tf')
start_scores, end_scores = distilbert(input_dict)
all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0])
answer = ' '.join(all_tokens[tf.math.argmax(start_scores, 1)[0] : tf.math.argmax(end_scores, 1)[0]+1])
FOR TF SERVE (in colab): (which was my original intent with this)
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update
!apt-get install tensorflow-model-server
import os
# path_to_model --> versions directory --> '/path/to/my/model/distilbert_qa/'
# actual stored model path version 1 --> '/path/to/my/model/distilbert_qa/1/'
MODEL_DIR = 'path_to_model'
os.environ["MODEL_DIR"] = os.path.abspath(MODEL_DIR)
%%bash --bg
nohup tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path="${MODEL_DIR}" >server.log 2>&1
!tail server.log
MAKE A POST REQUEST:
import json
!pip install -q requests
import requests
import numpy as np
max_length = 384 # must equal model signature expected input value
question, text = "Who was Benjamin?", "Benjamin was a good boy."
# padding='max_length' pads the input to the expected input length (else incompatible shapes error)
input_dict = tokenizer(question, text, return_tensors='tf', padding='max_length', max_length=max_length)
input_ids = input_dict["input_ids"].numpy().tolist()[0]
att_mask = input_dict["attention_mask"].numpy().tolist()[0]
features = [{'input_ids': input_ids, 'attention_mask': att_mask}]
data = json.dumps({ "signature_name": "serving_default", "instances": features})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
print(json_response)
predictions = json.loads(json_response.text)['predictions']
all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0])
answer = ' '.join(all_tokens[tf.math.argmax(predictions[0]['output_0']) : tf.math.argmax(predictions[0]['output_1'])+1])
print(answer)

`Table not initialized` when predicting with AI-platform

I'm trying to save faster R-CNN hub model and use it with AI-platform gcloud ai-platform local predict. The error I'm getting is:
Failed to run the provided model: Exception during running the graph: [_Derived_] Table not initialized.\n\t [[{{node hub_input/index_to_string_1_Lookup}}]]\n\t [[StatefulPartitionedCall_1/StatefulPartitionedCall/model/keras_layer/StatefulPartitionedCall]] (Error code: 2)\n'
The code for saving the model:
model_url = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"
input = tf.keras.Input(shape=(), dtype=tf.string)
decoded = tf.keras.layers.Lambda(
lambda y: tf.map_fn(
lambda x: tf.image.resize(
tf.image.convert_image_dtype(
tf.image.decode_jpeg(x, channels=3), tf.float32), (416, 416)
),
tf.io.decode_base64(y), dtype=tf.float32)
)(input)
results = hub.KerasLayer(model_url, signature_outputs_as_dict=True)(decoded)
model = tf.keras.Model(inputs=input, outputs=results)
model.save("./saved_model", save_format="tf")
The model works when I load with tf.keras.models.load_model("./saved_model") and predict with it, but not with ai-platform local predict.
Command for ai-platform local predictions:
gcloud ai-platform local predict --model-dir ./saved_model --json-instances data.json --framework TENSORFLOW
Versions:
python 3.7.0
tensorflow==2.2.0
tensorflow-hub==0.7.0
Output of saved_model_cli:
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image_bytes'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: serving_default_image_bytes:0
The given SavedModel SignatureDef contains the following output(s):
outputs['keras_layer'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 4)
name: StatefulPartitionedCall_1:0
outputs['keras_layer_1'] tensor_info:
dtype: DT_STRING
shape: (-1, 1)
name: StatefulPartitionedCall_1:1
outputs['keras_layer_2'] tensor_info:
dtype: DT_INT64
shape: (-1, 1)
name: StatefulPartitionedCall_1:2
outputs['keras_layer_3'] tensor_info:
dtype: DT_STRING
shape: (-1, 1)
name: StatefulPartitionedCall_1:3
outputs['keras_layer_4'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall_1:4
Method name is: tensorflow/serving/predict
Any ideas on how to fix the error?
The problem is that your input is being interpreted as a scalar. Do:
input = tf.keras.Input(shape=(1,), dtype=tf.string)

Deploy keras model use tensorflow serving got 501 Server Error: Not Implemented for url: http://localhost:8501/v1/models/genre:predict

I saved keras .h5 model to .pb using SavedModelBuilder. After I use docker image of tensorflow/serving:1.14.0 deploy my model, when I run predict process, I got the "requests.exceptions.HTTPError: 501 Server Error: Not Implemented for url: http://localhost:8501/v1/models/genre:predict"
The model building code as follows:
from keras import backend as K
import tensorflow as tf
from keras.models import load_model
model=load_model('/home/li/model.h5')
model_signature =
tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'input': model.input}, outputs={'output': model.output})
#export_path = os.path.join(model_path,model_version)
export_path = "/home/li/genre/1"
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
builder.add_meta_graph_and_variables(
sess=K.get_session(),
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
'predict':
model_signature,
'serving_default':
model_signature
})
builder.save()
Then I got the .pb model:
When I run saved_model_cli show --dir /home/li/genre/1 --all, The saved .pd model infomation as follows:
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1, 128, 1292)
name: conv2d_1_input_2:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 19)
name: dense_2_2/Softmax:0
Method name is: tensorflow/serving/predict
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1, 128, 1292)
name: conv2d_1_input_2:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 19)
name: dense_2_2/Softmax:0
Method name is: tensorflow/serving/predict
The command I use to deploy on docker image tensorflow/serving is
docker run -p 8501:8501 --name tfserving_genre --mount type=bind,source=/home/li/genre,target=/models/genre -e MODEL_NAME=genre -t tensorflow/serving:1.14.0 &
When open http://localhost:8501/v1/models/genre in browser, I got the message
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
The client prediction code as follows:
import requests
import numpy as np
import os
import sys
from audio_to_spectrum_v2 import split_song_to_frames
# Define a Base client class for Tensorflow Serving
class TFServingClient:
"""
This is a base class that implements a Tensorflow Serving client
"""
TF_SERVING_URL_FORMAT = '{protocol}://{hostname}: {port}/v1/models/{endpoint}:predict'
def __init__(self, hostname, port, endpoint, protocol="http"):
self.protocol = protocol
self.hostname = hostname
self.port = port
self.endpoint = endpoint
def _query_service(self, req_json):
"""
:param req_json: dict (as define in https://cloud.google.com/ml-engine/docs/v1/predict-request)
:return: dict
"""
server_url = self.TF_SERVING_URL_FORMAT.format(protocol=self.protocol,
hostname=self.hostname,
port=self.port,
endpoint=self.endpoint)
response = requests.post(server_url, json=req_json)
response.raise_for_status()
print(response.json())
return np.array(response.json()['output'])
# Define a specific client for our inception_v3 model
class GenreClient(TFServingClient):
# INPUT_NAME is the config value we used when saving the model (the only value in the `input_names` list)
INPUT_NAME = "input"
def load_song(self, song_path):
"""Load a song from path,slices to pieces, and extract features, returned as np.array format"""
song_pieces = split_song_to_frames(song_path,False,30)
return song_pieces
def predict(self, song_path):
song_pieces = self.load_song(song_path)
# Create a request json dict
req_json = {
"instances": song_pieces.tolist()
}
print(req_json)
return self._query_service(req_json)
def main():
song_path=sys.argv[1]
print("file name:{}".format(os.path.split(song_path)[-1]))
hostname = "localhost"
port = "8501"
endpoint="genre"
client = GenreClient(hostname=hostname, port=port, endpoint=endpoint)
prediction = client.predict(song_path)
print(prediction)
if __name__=='__main__':
main()
After run the prediction code, I got the error information as follows:
Traceback (most recent call last):
File "client_predict.py", line 90, in <module>
main()
File "client_predict.py", line 81, in main
prediction = client.predict(song_path)
File "client_predict.py", line 69, in predict
return self._query_service(req_json)
File "client_predict.py", line 40, in _query_service
response.raise_for_status()
File "/home/li/anaconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 501 Server Error: Not Implemented for url: http://localhost:8501/v1/models/genre:predict
I wonder what's the reason of this deployment problem, and how to solve it, Thanks for all.
I've tried to print the response use
pred = json.loads(r.content.decode('utf-8'))
print(pred)
The problem is caused by the "conv implementation only supports NHWC tensor format for now."
At last, I change the data format from NCHW to NWHC in Conv2d

Tensorflow served model fails with error : "You must feed a value for placeholder tensor "

Following is the signature def of the tensorflow model being served:
MetaGraphDef with tag-set: 'serve' contains the following:
SignatureDefs:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['in'] tensor_info:
dtype: DT_INT32
shape: (-1, 10)
name: input_sentences:0
The given SavedModel SignatureDef contains the following output(s):
outputs['out'] tensor_info:
dtype: DT_INT32
shape: (-1, 1)
name: output_sentences:0
Method name is: tensorflow/serving/predict
After serving this model, when I pass an input of shape [-1, 10], I get the following error:
"You must feed a value for placeholder tensor 'output_sentences' with dtype int32 and shape [?,1]"
even though output_sentences is part of outputs.
Please help me out.