cannot deploy YAMNet model to SageMaker - tensorflow

I followed this tutorial and had the model fine-tuned.
the model-saving part of serving model is like this:
saved_model_path = 'dogs_and_cats_yamnet/yamnet-model/00000001'
input_segment = tf.keras.layers.Input(shape=(), dtype=tf.float32, name='audio')
embedding_extraction_layer = hub.KerasLayer(yamnet_model_handle,
trainable=False, name='yamnet')
_, embeddings_output, _ = embedding_extraction_layer(input_segment)
serving_outputs = my_model(embeddings_output)
serving_outputs = ReduceMeanLayer(axis=0, name='classifier')(serving_outputs)
serving_model = tf.keras.Model(input_segment, serving_outputs)
serving_model.save(saved_model_path, include_optimizer=False)
Then followed this page, uploading the model to S3 and deploying the model.
!tar -C "$PWD" -czf dogs_and_cats_yamnet.tar.gz dogs_and_cats_yamnet/
model_data = Session().upload_data(path="dogs_and_cats_yamnet.tar.gz", key_prefix="model")
model = TensorFlowModel(model_data=model_data, role=sagemaker_role, framework_version="2.3")
predictor = model.deploy(initial_instance_count=1, instance_type="ml.c5.xlarge")
Deployment seems successful, but when I try to do inference,
waveform = np.zeros((3*48000), dtype=np.float32)
result = predictor.predict(waveform)
the following error occurs.
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"error": "The first dimension of paddings must be the rank of inputs[1,2] [1,144000]\n\t [[{{node yamnet_frames/tf_op_layer_Pad/Pad}}]]"
I have no idea why this happens. I am struggling with it for hours and coming up with no clue.
YAMNet works fine when I pulled the model from tf hub directly and take inference with it.
This is kind of a minor question I guess, but I would appreciate any helpful answers.
Thank you in advance.

Related

How to use mlflow to deploy model that requires tensorflow_text for bert on local machine?

I recently use mlflow 1.29.0 to track my model training, I use BERT for text embedding which need to import tensorflow_text to register op before training, here is an example:
import tensorflow_hub as hub
import tensorflow_text as text
def create_model():
text_input = tf.keras.layers.Input(shape = (),dtype = tf.string, name = 'text_input')
preprocessed_text = preprocess_model(text_input)
encoder_text = encoder_model(preprocessed_text)['pooled_output']
text_output = tf.keras.layers.Dropout(0.1,name = 'dropout1')(encoder_text)
text_output = tf.keras.layers.Dense(units = 400, activation = tf.keras.activations.sigmoid, name = 'text_dense1')(text_output)
text_output = tf.keras.layers.Dropout(0.1,name = 'dropout2')(text_output)
final_output = tf.keras.layers.Dense(units = 1, activation = tf.keras.activations.sigmoid,name = 'output')(text_output)
model = tf.keras.Model(inputs = [text_input],outputs = [final_output])
return model
if __name__ == '__main__':
preprocess_path = 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3'
encoder_path = 'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/4'
preprocess_model = hub.KerasLayer(preprocess_path)
encoder_model = hub.KerasLayer(encoder_path)
with mlflow.start_run() as run:
model = create_model()
model.fit(...)
mlflow.keras.log_model(keras_model = model,...)
mlflow.end_run()
The code run successfully, and the mlflow ui showed everything, however, when I start to deploy the data on my local machine with the following command
mlflow sagemaker build-and-push-container
mlflow sagemaker run-local -m runs:/XXXXX/XXXX -p 4999
it showed the following error:
FileNotFoundError: Op type not registered 'CaseFoldUTF8' in binary running on mighty. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
I think it's because the tensorflow_text need to be import before running the model. (based on mlflow, the conda.yaml contain tensorflow-text==2.3.0)
I've meet this error several times when I train the model, I put 'import tensorflow_text as text' at top and the problem was fixed.
However I'm not quite sure how to do that when I deploy the model locally, can anyone help me with that? thank you!
I tried other command like mlflow models serve -m runs:/XXXXX/XXXX -p 4999, and the error is still there.

Model testing on AWS sagemaker "could not convert string to float"

The XGboost model was trained on AWS sagemaker and deployed successfully but I keep getting the following error: ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (415) from model with message "could not convert string to float: ".
Any thoughts?
Test data is as following:
size mean
269 5600.0 17.499633
103 1754.0 9.270272
160 4968.0 14.080601
40 4.0 17.500000
266 36308.0 11.421855
test_data_array = test_data.drop(['mean'], axis=1).as_matrix()
test_data_array = np.array([np.float32(x) for x in test_data_array])
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
def predict(data, rows=32):
split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
#print(split_array)
predictions = ''
for array in split_array:
print(array[0], type(array[0]))
predictions = ','.join([predictions, xgb_predictor.predict(array[0]).decode('utf-8')])
return np.fromstring(predictions[1:], sep=',')
predictions = predict(test_data_array)
SageMaker XGBoost cannot handle csv input with header. Please make sure the string header was removed before sending the data to the endpoint.
Also for csv prediction, SageMaker XGBoost assumes that CSV input does not have the label column. So please remove the label column in the input data as well.

Correct format for input data on CloudML

I'm trying to send a job up to my object detection model on CloudML to get predictions. I'm following the guide at https://cloud.google.com/ml-engine/docs/online-predict but I'm getting an error when submitting the request:
RuntimeError: Prediction failed: Error processing input: Expected uint8, got '\xf6>\x00\x01\x04\xa4d\x94...(more bytes)...\x00\x10\x10\x10\x04\x80\xd9' of type 'str' instead.
This is my code:
img = base64.b64encode(open("file.jpg", "rb").read()).decode('utf-8')
json = {"b64": img}
result = predict_json(project, model, json, "v1")
My fault, I forgot to add --input_type encoded_image_string_tensor when I exported the graph.

How to use Tensorflow

I've built multiple DNN and conVNN using tensorflow, and I can reach now a good accuracy. Now my question is how can I use this trained networks in real example.
I case of a convNN for computer vision, how can I use the model to classify a new picture ? can I generate something like convNN.exe that get images as input parameter that through the classification result out ?
Once you've trained the model, you should save it somewhere by adding code similar to
builder = saved_model_builder.SavedModelBuilder(export_path)
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
'predict_images':
prediction_signature,
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
},
legacy_init_op=legacy_init_op)
builder.save()
Then, you can use Tensorflow serving to serve your model using a high-performance C++ server by running
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
--port=9000 --model_name=mnist \
--model_base_path=/tmp/mnist_model/
Modifying the code for your model, of course. You'll need to implement a client; there's an example for MNIST here. The guts of the client would be something like:
def do_inference(hostport, work_dir, concurrency, num_tests):
"""Tests PredictionService with concurrent requests.
Args:
hostport: Host:port address of the PredictionService.
work_dir: The full path of working directory for test data set.
concurrency: Maximum number of concurrent requests.
num_tests: Number of test images to use.
Returns:
The classification error rate.
Raises:
IOError: An error occurred processing test data set.
"""
test_data_set = mnist_input_data.read_data_sets(work_dir).test
host, port = hostport.split(':')
channel = implementations.insecure_channel(host, int(port))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
result_counter = _ResultCounter(num_tests, concurrency)
for _ in range(num_tests):
request = predict_pb2.PredictRequest()
request.model_spec.name = 'mnist'
request.model_spec.signature_name = 'predict_images'
image, label = test_data_set.next_batch(1)
request.inputs['images'].CopyFrom(
tf.contrib.util.make_tensor_proto(image[0], shape=[1, image[0].size]))
result_counter.throttle()
result_future = stub.Predict.future(request, 5.0) # 5 seconds
result_future.add_done_callback(
_create_rpc_callback(label[0], result_counter))
return result_counter.get_error_rate()
def main(_):
if FLAGS.num_tests > 10000:
print('num_tests should not be greater than 10k')
return
if not FLAGS.server:
print('please specify server host:port')
return
error_rate = do_inference(FLAGS.server, FLAGS.work_dir,
FLAGS.concurrency, FLAGS.num_tests)
print('\nInference error rate: %s%%' % (error_rate * 100))
if __name__ == '__main__':
tf.app.run()
This is in Python, of course, but there's no reason you couldn't use another language (e.g. Go or C++) if you wanted to create a binary executable.

Tensorflow: Bug when using `tf.contrib.metrics.streaming_mean_iou`

I'm getting a strange error when trying to compute the intersection over union using tensorflows tf.contrib.metrics.streaming_mean_iou.
This was the code I was using before which works perfectly fine
tensorflow as tf
label = tf.image.decode_png(tf.read_file('/path/to/label.png'),channels=1)
label_lin = tf.reshape(label, [-1,])
weights = tf.cast(tf.less_equal(label_lin, 10), tf.int32)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(label_lin, label_lin,num_classes = 11,weights = weights)
init = tf.local_variables_initializer()
sess.run(init)
sess.run([update_op])
However when I use a mask like this
mask = tf.image.decode_png(tf.read_file('/path/to/mask_file.png'),channels=1)
mask_lin = tf.reshape(mask, [-1,])
mask_lin = tf.cast(mask_lin,tf.int32)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(label_lin, label_lin,num_classes = 11,weights = mask_lin)
init = tf.local_variables_initializer()
sess.run(init)
sess.run([update_op])
It keeps on failing after an irregular number of iterations showing this error:
*** Error in `/usr/bin/python': corrupted double-linked list: 0x00007f29d0022fd0 ***
I checked the shape and data type of both mask_lin and weights. They are the same, so I cannot really see what is going wrong here.
Also the fact that the error comes after calling update_op an irregular number of times is strange. Maybe TF empties the mask_lin object after calling several sess.run()'s ?
Or is this some TF bug? But then again why would it work with weights...