Can not load pb file in tensorflow serving - tensorflow

I have used SavedModel (Inception_resnet_v2) to export the TensorFlow model files and use TensorFlow Serving to load the files.I have directly replaced offical minst saved_model.pb with my own Inception_resnet_v2 saved_model.pb file. But I got one error.
deep#ubuntu:~/serving$ bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/home/deep/serving/tmp/mnist_model
2017-06-18 10:39:41.963490: I tensorflow_serving/model_servers/main.cc:146] Building single TensorFlow model file config: model_name: mnist model_base_path: home/deep/serving/tmp/mnist_model model_version_policy: 0
2017-06-18 10:39:41.963752: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-06-18 10:39:41.963762: I tensorflow_serving/model_servers/server_core.cc:421] (Re-)adding model: mnist
2017-06-18 10:39:42.065556: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: mnist version: 1}
2017-06-18 10:39:42.065610: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: mnist version: 1}
2017-06-18 10:39:42.065648: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: mnist version: 1}
2017-06-18 10:39:42.065896: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/deep/serving/tmp/mnist_model/1
2017-06-18 10:39:42.066130: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:226] Loading SavedModel from: /home/deep/serving/tmp/mnist_model/1
2017-06-18 10:39:42.080775: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:274] Loading SavedModel: fail. Took 14816 microseconds.
2017-06-18 10:39:42.080822: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: mnist version: 1} failed: Not found: Could not find meta graph def matching supplied tags.
What should I do? Thanks!

I chatted to the Serving engineers, and here are some of their thoughts on this:
Looks like they need to specify a tag either in the saved model, or on
the command line. (log line of note: failed: Not found: Could not find
meta graph def matching supplied tags. )
It looks like the SavedModel loader is unable to find a graph
corresponding to the tags they have supplied. Here is some
documentation:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/saved_model#tags
Ah, to add: They could use the SavedModel CLI to inspect the model and
see what tag-sets are available. Here is the documentation for that:
https://www.tensorflow.org/versions/master/programmers_guide/saved_model_cli.
They can run
saved_model_cli show --dir <SavedModelDir>
to check what tag-sets are in the SavedModel if they have pip
installed tensorflow.

Related

Tensorflow 2.0 model checkpoint files to .pb/onnx file

For model checkpoint files (usually consist of .meta, .data, .index) generated from TF-2.0, how can I convert it to onnx or pb file?
since I found most of the existing tools, such as tf2onnx only support TF-1.x.
tf2onnx supports tf2, saved models, and checkpoint files. I would recommend making a saved model:
model.save("path")
Then use tf2onnx command line:
tf2onnx.convert --saved-model path/to/model --output model.onnx --opset 12
https://github.com/onnx/tensorflow-onnx#--saved-model

tensorflow serving returning NaN when predict

I had trained one GAN model and saved the generator by the following function:
tf.keras.models.save_model(
generator,
filepath=os.path.join(MODEL_PATH, 'model_saver'),
overwrite=True,
include_optimizer=False,
save_format=None,
options=None
)
It predicts successfully when load model by tf.keras.models.load_model in python. But when serving the model in tensorflow model server, the model returns NaN value.
I serve the model by the following:
zhaocc:~/products/tensorflow_server$ sudo docker run -t --rm -p 8502:8501 -v "/tmp/pix2pix/sketch_photo/model_saver:/models/photo2sketch" -e MODEL_NAME=photo2sketch tensorflow/serving &
[3] 30089
zhaocc:~/products/tensorflow_server$ 2020-06-17 12:57:31.745339: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: photo2sketch model_base_path: /models/photo2sketch
2020-06-17 12:57:31.745448: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-06-17 12:57:31.745459: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: photo2sketch
2020-06-17 12:57:31.846162: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: photo2sketch version: 1}
2020-06-17 12:57:31.846213: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846233: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846282: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/photo2sketch/1
2020-06-17 12:57:31.874158: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-17 12:57:31.874182: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/photo2sketch/1
2020-06-17 12:57:31.874315: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-17 12:57:31.952982: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-17 12:57:32.172641: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/photo2sketch/1
2020-06-17 12:57:32.248514: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 402236 microseconds.
2020-06-17 12:57:32.256576: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/photo2sketch/1/assets.extra/tf_serving_warmup_requests
2020-06-17 12:57:32.265064: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: photo2sketch version: 1}
2020-06-17 12:57:32.267113: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-06-17 12:57:32.269289: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
When I predict by REST request, it return NaN with correct shape:
[[[[nan nan nan]
[nan nan nan]
[nan nan nan]
...
[nan nan nan]
[nan nan nan]
[nan nan nan]]
Anybody knows why? How can I debug it? Thanks very much!
I had the very same problem with my Pix2Pix generator. The problem was with the training parameter. As explained here What does `training=True` mean when calling a TensorFlow Keras model? this parameter affects the results of the network. One possible solution is to remove all dropouts (and other affected parts) prior to saving the network. This solution did not work for me (probably missed something). So instead as a temporary workaround, I added 2 signatures to the model
#tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict1(input_batch):
return {'outputs': generator(input_batch, training=True)}
#tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict2(input_batch):
return {'outputs': generator(input_batch, training=False)}
...
generator.save(base_path + "kerassave",signatures={'predict1': model_predict1, 'predict2': model_predict2})
predict2 still always returned nans. predict1 worked, however.

why tensorrt optimized pb can not be deployed by tf-serving?

I am using tensorrt to accelerate inference speed of Tacotron2 model. I used
tensorrt 5.0.2.6 version and tensorflow 1.13.0.rc0 version.
I convert savedmodel to tftrt savedmodel using the tensorrt api below:
trt.create_inference_graph(
input_graph_def=None,
outputs=None,
max_batch_size=32,
input_saved_model_dir=os.path.join(args.export_dir, args.version),
output_saved_model_dir=args.output_saved_model_dir,
precision_mode=args.precision_mode)
The outputed tensorrt_savedmodel.pb can not be imported into tensorboard for view but tensorrt_savedmodel.pb can be deployed with tf-serving.
However, when client request the tf-serving using grpc there is an error:
<_Rendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The TF function for the TRT segment could not be empty
[[{{node model/inference/prenet/TRTEngineOp_33}}]]"
debug_error_string = "
{"created":"#1572417319.714936208","description":"Error received from peer ipv4:192.168.23.17:8500","file":"src/core/lib/surface/call.cc","file_line":1052,"grpc_message":"The TF function for the TRT segment could not be empty\n\t [[{{node model/inference/prenet/TRTEngineOp_33}}]]","grpc_status":3}
any solutions about this issue?
TensorFlow saved model also provides a formal and consistent way to use tensorrt. You can try to convert it with the saved_model_cli and then deploy to tf-serving.
usage: saved_model_cli convert [-h] --dir DIR --output_dir OUTPUT_DIR
--tag_set TAG_SET
{tensorrt} ...
Usage example:
To convert the SavedModel to one that have TensorRT ops:
$saved_model_cli convert \
--dir /tmp/saved_model \
--tag_set serve \
--output_dir /tmp/saved_model_trt \
tensorrt
optional arguments:
-h, --help show this help message and exit
--dir DIR directory containing the SavedModel to convert
--output_dir OUTPUT_DIR
output directory for the converted SavedModel
--tag_set TAG_SET tag-set of graph in SavedModel to convert, separated by ','
conversion methods:
valid conversion methods
{tensorrt} the conversion to run with the SavedModel

tf-serving abnormal exit without error message

tf-serving abnormal exit without error message
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ReaHat EL6
TensorFlow Serving installed from (source or binary): source using bazel 0.18.0
TensorFlow Serving version: 1.12.0
Describe the problem
i compile the tf-serving using bazel in RHEL 6.9, and start it using:
./model_servers/tensorflow_model_server --model_config_file=./data/models.conf --rest_api_port=8502
models.conf:
model_config_list: {
config: {
name: "model_1",
base_path:"/search/work/tf_serving_bin/tensorflow_serving/data/model_data/model_1",
model_platform: "tensorflow",
model_version_policy: {
latest: {
num_versions: 1
}
}
}
}
Client using C++, and use libCurl to request tf-serving REST api, but, tf-serving often abnormal exits without error message in some minutes.
When my client service requests localhost tf-serving, the question occur frequently. But, client service requests tf-serving at other machines, the question do not occur, qps < 100.
I check memory, cpu idle, etc... no problems is found. so, it is very strange.
export export TF_CPP_MIN_VLOG_LEVEL=1, no error/critical message too.
Source code / logs
2019-01-09 09:28:35.118183: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-01-09 09:28:35.118259: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: app_ks_nfm_1
2019-01-09 09:28:35.227383: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227424: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227443: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227492: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.227530: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.256712: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-01-09 09:28:35.267728: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 09:28:35.313087: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2019-01-09 09:28:38.797633: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key legacy_init_op on SavedModel bundle.
2019-01-09 09:28:38.803984: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 3570131 microseconds.
2019-01-09 09:28:38.804027: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359/assets.extra/tf_serving_warmup_requests
2019-01-09 09:28:38.804148: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:38.831860: I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-01-09 09:28:38.865243: I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8502 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...
It is not an abnormal exit. It is an indication that the Server is ready to receive the Inference Requests.
For clarification, please find the below explanation:
docker run --runtime=nvidia -p 8501:8501 \
--mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\
target=/models/half_plus_two \
-e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &
This will run the docker container with the nvidia-docker runtime, launch the TensorFlow Serving Model Server, bind the REST API port 8501, and map our desired model from our host to where models are expected in the container. We also pass the name of the model as an environment variable, which will be important when we query the model.
TIP: Before querying the model, be sure to wait till you see a message like the following, indicating that the server is ready to receive requests:
2018-07-27 00:07:20.773693: I tensorflow_serving/model_servers/main.cc:333]
Exporting HTTP/REST API at:localhost:8501 ...
After that Message, just press Enter and you can query the model using the below command
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
-X POST http://localhost:8501/v1/models/half_plus_two:predict
For more information, refer the below link:
https://www.tensorflow.org/tfx/serving/docker#gpu_serving_example
The Reason:the short connection product a large amount of TCP Status 'TIME_WAIT', the available linux system file handle is occupied.

Distributed Deep recommended System- Tensorflow: failed: Session bundle or SavedModel bundle not found at specified export location

After executing the distributed tensor flow as mentioned in the following link:
https://github.com/tobegit3hub/deep_recommend_system/tree/master/distributed
I got the following in ./checkpoint folder;
checkpoint
graph.pbtxt
model.ckpt-269.data-00000-of-00001
model.ckpt-269.index
model.ckpt-269.meta
I wanted to run tensorFlow serving on the above model provided in TensorFlow Serving. But I do get the below error when doing so:
./tensorflow_model_server --port="9000" --model_base_path=./model/
2017-07-14 15:32:32.791636: I tensorflow_serving/model_servers/main.cc:151] Building single TensorFlow model file config: model_name: default model_base_path: ./model/ model_version_policy: 0
2017-07-14 15:32:32.792156: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-07-14 15:32:32.792188: I tensorflow_serving/model_servers/server_core.cc:421] (Re-)adding model: default
2017-07-14 15:32:32.893072: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: default version: 1}
2017-07-14 15:32:32.893143: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: default version: 1}
2017-07-14 15:32:32.893165: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: default version: 1}
2017-07-14 15:32:32.893252: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: default version: 1} failed: Not found: Session bundle or SavedModel bundle not found at specified export location
Any suggestion on this ?