Distributed Deep recommended System- Tensorflow: failed: Session bundle or SavedModel bundle not found at specified export location - tensorflow

After executing the distributed tensor flow as mentioned in the following link:
https://github.com/tobegit3hub/deep_recommend_system/tree/master/distributed
I got the following in ./checkpoint folder;
checkpoint
graph.pbtxt
model.ckpt-269.data-00000-of-00001
model.ckpt-269.index
model.ckpt-269.meta
I wanted to run tensorFlow serving on the above model provided in TensorFlow Serving. But I do get the below error when doing so:
./tensorflow_model_server --port="9000" --model_base_path=./model/
2017-07-14 15:32:32.791636: I tensorflow_serving/model_servers/main.cc:151] Building single TensorFlow model file config: model_name: default model_base_path: ./model/ model_version_policy: 0
2017-07-14 15:32:32.792156: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-07-14 15:32:32.792188: I tensorflow_serving/model_servers/server_core.cc:421] (Re-)adding model: default
2017-07-14 15:32:32.893072: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: default version: 1}
2017-07-14 15:32:32.893143: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: default version: 1}
2017-07-14 15:32:32.893165: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: default version: 1}
2017-07-14 15:32:32.893252: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: default version: 1} failed: Not found: Session bundle or SavedModel bundle not found at specified export location
Any suggestion on this ?

Related

Do I need to add tf.compat.v1.disable_eager_execution() to export_inference_graph.py to convert tf.train.Checkpoint to SavedModel?

I found a question about this error (in a different scenario), many Github issues and articles, but it seemingly always has to do with people upgrading from TF 1.x to TF 2.x. I'm not doing that.
Here are my versions:
tensorflow 2.5.0
tensorflow-addons 0.13.0
tensorflow-datasets 4.3.0
tensorflow-estimator 2.5.0
tensorflow-gpu 2.5.0
I'm trying to use TF object detection, converting a model trained in TF 2.5 via Python to a tensorflow.js compatible model and asked a question about it. The answer given was to start by running:
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
So my command ended up being:
py Tensorflow\models\research\object_detection\export_inference_graph.py
--input_type image_tensor
--pipeline_config_path Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config
--trained_checkpoint_prefix Tensorflow\workspace\pre-trained-models\ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8\checkpoint\ckpt-0.data-00000-of-00001
--output_directory Tensorflow\workspace\models\my_ssd_mobnet\export
Which resulted in the error:
RuntimeError: tf.placeholder() is not compatible with eager execution
I do see in the logs a common cause of this error, I know where it's coming from:
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
But I don't understand how to deal with this, since I'm not writing any of these Tensorflow modules, I'm just trying to do something basic with existing modules, like converting a tf.train.Checkpoint to SavedModel.
Normally the answer seems to be to call tf.compat.v1.disable_eager_execution() but the weird thing about this is it's not my code, I don't know what else I'll potentially break in this conversion script by disabling a feature. Nor am I good enough with the Tensorflow API yet to really understand that script.
Full logs and trace:
2021-07-15 09:40:24.482953: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.835151: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-15 09:40:26.856379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.856487: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.861810: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-15 09:40:26.861891: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-15 09:40:26.864685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-15 09:40:26.865561: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-15 09:40:26.872246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-15 09:40:26.874465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-15 09:40:26.874979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-15 09:40:26.875238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:26.876220: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-15 09:40:26.877353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.877556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:27.285985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 09:40:27.286153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-15 09:40:27.286917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-15 09:40:27.287164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5957 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 206, in <module>
tf.app.run()
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 194, in main
exporter.export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 611, in export_inference_graph
_export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 503, in _export_inference_graph
outputs, placeholder_tensor_dict = build_detection_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 457, in build_detection_graph
placeholder_tensor, input_tensors = input_placeholder_fn_map[input_type](
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3268, in placeholder
raise RuntimeError("tf.placeholder() is not compatible with "
RuntimeError: tf.placeholder() is not compatible with eager execution.
What could I be doing here that would cause this error? Did I install the wrong version of the conversion script? I checked that I have the latest Tensorflow files from the official repo, and that's where export_inference_graph.py is found. Does the conversion script just not work with Tensorflow 2.x? Do I need to modify the conversion script with tf.compat.v1.disable_eager_execution()? Will this cause other problems in the script since I'm disabling a feature?
Edit:
I know some models in the object detection were built for tf 1.x (model zoo) and others 2.x (model zoo). I verified that I have a 2.x model, so that's not the cause.
Tensorflow allows you to save the model in multiple different format (checkpoint or savedmodel). The checkpoint just saves the weights for every layer so when loading the model, you need to first define the network architecture and then load the weights. The SavedModel saves the complete model i.e. architecture, weights and training configuration (including the optimizer weights). This link has more details related to the various format that are available.
https://www.tensorflow.org/tutorials/keras/save_and_load
In your case, since tfjs requires savedmodel as input, you can directly save the tensorflow model in the savedmodel rather than saving it first as checkpoint and then trying to convert it to savedmodel format.

tensorflow serving returning NaN when predict

I had trained one GAN model and saved the generator by the following function:
tf.keras.models.save_model(
generator,
filepath=os.path.join(MODEL_PATH, 'model_saver'),
overwrite=True,
include_optimizer=False,
save_format=None,
options=None
)
It predicts successfully when load model by tf.keras.models.load_model in python. But when serving the model in tensorflow model server, the model returns NaN value.
I serve the model by the following:
zhaocc:~/products/tensorflow_server$ sudo docker run -t --rm -p 8502:8501 -v "/tmp/pix2pix/sketch_photo/model_saver:/models/photo2sketch" -e MODEL_NAME=photo2sketch tensorflow/serving &
[3] 30089
zhaocc:~/products/tensorflow_server$ 2020-06-17 12:57:31.745339: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: photo2sketch model_base_path: /models/photo2sketch
2020-06-17 12:57:31.745448: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-06-17 12:57:31.745459: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: photo2sketch
2020-06-17 12:57:31.846162: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: photo2sketch version: 1}
2020-06-17 12:57:31.846213: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846233: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846282: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/photo2sketch/1
2020-06-17 12:57:31.874158: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-17 12:57:31.874182: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/photo2sketch/1
2020-06-17 12:57:31.874315: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-17 12:57:31.952982: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-17 12:57:32.172641: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/photo2sketch/1
2020-06-17 12:57:32.248514: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 402236 microseconds.
2020-06-17 12:57:32.256576: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/photo2sketch/1/assets.extra/tf_serving_warmup_requests
2020-06-17 12:57:32.265064: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: photo2sketch version: 1}
2020-06-17 12:57:32.267113: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-06-17 12:57:32.269289: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
When I predict by REST request, it return NaN with correct shape:
[[[[nan nan nan]
[nan nan nan]
[nan nan nan]
...
[nan nan nan]
[nan nan nan]
[nan nan nan]]
Anybody knows why? How can I debug it? Thanks very much!
I had the very same problem with my Pix2Pix generator. The problem was with the training parameter. As explained here What does `training=True` mean when calling a TensorFlow Keras model? this parameter affects the results of the network. One possible solution is to remove all dropouts (and other affected parts) prior to saving the network. This solution did not work for me (probably missed something). So instead as a temporary workaround, I added 2 signatures to the model
#tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict1(input_batch):
return {'outputs': generator(input_batch, training=True)}
#tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict2(input_batch):
return {'outputs': generator(input_batch, training=False)}
...
generator.save(base_path + "kerassave",signatures={'predict1': model_predict1, 'predict2': model_predict2})
predict2 still always returned nans. predict1 worked, however.

tf-serving abnormal exit without error message

tf-serving abnormal exit without error message
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ReaHat EL6
TensorFlow Serving installed from (source or binary): source using bazel 0.18.0
TensorFlow Serving version: 1.12.0
Describe the problem
i compile the tf-serving using bazel in RHEL 6.9, and start it using:
./model_servers/tensorflow_model_server --model_config_file=./data/models.conf --rest_api_port=8502
models.conf:
model_config_list: {
config: {
name: "model_1",
base_path:"/search/work/tf_serving_bin/tensorflow_serving/data/model_data/model_1",
model_platform: "tensorflow",
model_version_policy: {
latest: {
num_versions: 1
}
}
}
}
Client using C++, and use libCurl to request tf-serving REST api, but, tf-serving often abnormal exits without error message in some minutes.
When my client service requests localhost tf-serving, the question occur frequently. But, client service requests tf-serving at other machines, the question do not occur, qps < 100.
I check memory, cpu idle, etc... no problems is found. so, it is very strange.
export export TF_CPP_MIN_VLOG_LEVEL=1, no error/critical message too.
Source code / logs
2019-01-09 09:28:35.118183: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-01-09 09:28:35.118259: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: app_ks_nfm_1
2019-01-09 09:28:35.227383: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227424: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227443: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227492: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.227530: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.256712: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-01-09 09:28:35.267728: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 09:28:35.313087: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2019-01-09 09:28:38.797633: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key legacy_init_op on SavedModel bundle.
2019-01-09 09:28:38.803984: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 3570131 microseconds.
2019-01-09 09:28:38.804027: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359/assets.extra/tf_serving_warmup_requests
2019-01-09 09:28:38.804148: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:38.831860: I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-01-09 09:28:38.865243: I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8502 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...
It is not an abnormal exit. It is an indication that the Server is ready to receive the Inference Requests.
For clarification, please find the below explanation:
docker run --runtime=nvidia -p 8501:8501 \
--mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\
target=/models/half_plus_two \
-e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &
This will run the docker container with the nvidia-docker runtime, launch the TensorFlow Serving Model Server, bind the REST API port 8501, and map our desired model from our host to where models are expected in the container. We also pass the name of the model as an environment variable, which will be important when we query the model.
TIP: Before querying the model, be sure to wait till you see a message like the following, indicating that the server is ready to receive requests:
2018-07-27 00:07:20.773693: I tensorflow_serving/model_servers/main.cc:333]
Exporting HTTP/REST API at:localhost:8501 ...
After that Message, just press Enter and you can query the model using the below command
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
-X POST http://localhost:8501/v1/models/half_plus_two:predict
For more information, refer the below link:
https://www.tensorflow.org/tfx/serving/docker#gpu_serving_example
The Reason:the short connection product a large amount of TCP Status 'TIME_WAIT', the available linux system file handle is occupied.

Can a "address family for nodename not supported" warning prevent proper serving?

I managed to export a Keras model for segmentation into a tensorflow/serving:1.10.0-gpu-based container. However, at start up I notice a warning in the docker logs, just before the event loop starts: [warn] getaddrinfo: address family for nodename not supported. I'm not sure what this means but so far I haven't been able to get a response from the server. Instead the client receives a status = StatusCode.UNAVAILABE, details="OS Error", "grpc_status":14.
Is this somehow related to that warning? Am I experiencing some kind of networking problem between the gRPC client and the tfserving container due to this unsupported address family?
For completeness, I post the docker logs below. Note that I cleared timestamps and unimportant lines out of the log for readability:
[]: I tensorflow_serving/model_servers/main.cc:157] Building single TensorFlow model file config: model_name: mrcnn model_base_path: /models/mrcnn
[]: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
[]: I tensorflow_serving/model_servers/server_core.cc:517] (Re-)adding model: mrcnn
[]: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: mrcnn version: 1}
[]: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: mrcnn version: 1}
[]: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: mrcnn version: 1}
[]: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /models/mrcnn/1
[]: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/mrcnn/1
[]: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
<skip>
[]: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10277 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:68:00.0, compute capability: 6.1)
[]: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:113] Restoring SavedModel bundle.
[]: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:148] Running LegacyInitOp on SavedModel bundle.
[]: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] SavedModel load for tags { serve }; Status: success. Took 1240882 microseconds.
<skip>
[]: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: mrcnn version: 1}
[]: I tensorflow_serving/model_servers/main.cc:327] Running ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 235] RAW: Entering the event loop ...
[]: I tensorflow_serving/model_servers/main.cc:337] Exporting HTTP/REST API at:localhost:8501 ..
Short answer is no, that warning is benign. My hunch is that your client isn't able to talk to the server, possibly because of how you have bound the docker ports or your client's code or how you're invoking it.
When you launch your container, do not forget to specify the port with "-p" option.
docker run -d -p <port out>:<port in> <IMAGE>
Otherwise, you can get the ip address with this command:
docker-machine ip

Can not load pb file in tensorflow serving

I have used SavedModel (Inception_resnet_v2) to export the TensorFlow model files and use TensorFlow Serving to load the files.I have directly replaced offical minst saved_model.pb with my own Inception_resnet_v2 saved_model.pb file. But I got one error.
deep#ubuntu:~/serving$ bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/home/deep/serving/tmp/mnist_model
2017-06-18 10:39:41.963490: I tensorflow_serving/model_servers/main.cc:146] Building single TensorFlow model file config: model_name: mnist model_base_path: home/deep/serving/tmp/mnist_model model_version_policy: 0
2017-06-18 10:39:41.963752: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-06-18 10:39:41.963762: I tensorflow_serving/model_servers/server_core.cc:421] (Re-)adding model: mnist
2017-06-18 10:39:42.065556: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: mnist version: 1}
2017-06-18 10:39:42.065610: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: mnist version: 1}
2017-06-18 10:39:42.065648: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: mnist version: 1}
2017-06-18 10:39:42.065896: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/deep/serving/tmp/mnist_model/1
2017-06-18 10:39:42.066130: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:226] Loading SavedModel from: /home/deep/serving/tmp/mnist_model/1
2017-06-18 10:39:42.080775: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:274] Loading SavedModel: fail. Took 14816 microseconds.
2017-06-18 10:39:42.080822: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: mnist version: 1} failed: Not found: Could not find meta graph def matching supplied tags.
What should I do? Thanks!
I chatted to the Serving engineers, and here are some of their thoughts on this:
Looks like they need to specify a tag either in the saved model, or on
the command line. (log line of note: failed: Not found: Could not find
meta graph def matching supplied tags. )
It looks like the SavedModel loader is unable to find a graph
corresponding to the tags they have supplied. Here is some
documentation:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/saved_model#tags
Ah, to add: They could use the SavedModel CLI to inspect the model and
see what tag-sets are available. Here is the documentation for that:
https://www.tensorflow.org/versions/master/programmers_guide/saved_model_cli.
They can run
saved_model_cli show --dir <SavedModelDir>
to check what tag-sets are in the SavedModel if they have pip
installed tensorflow.