I'm unable to find the proper documentation to successfully serve the inception or mobilenet models and write a grpc client to connect to the server and perform image classification.
Till now, I've successfully configured the tfserving image on CPU only. Unable to run it on my GPU.
But, when I make a grpc client request, the request fails with the error.
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Expects arg[0] to be float but string is provided"
debug_error_string = "{"created":"#1571717090.210000000","description":"Error received from peer","file":"src/core/lib/surface/","file_line":1017,"grpc_message":"Expects arg[0] to be float but string is provided","grpc_status":3}"
I understand there is some issue in the request format but I couldn't find a proper documentation for the grpc client that can pin-point to correct direction.
Here's the grpc client that I used for the request.
from __future__ import print_function
import grpc
import tensorflow as tf
import time
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc'server', 'localhost:8505',
'PredictionService host:port')'image', 'E:/Data/Docker/tf_serving/cat.jpg', '‪path to image')
def main(_):
channel = grpc.insecure_channel(FLAGS.server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# Send request
with open(FLAGS.image, 'rb') as f:
# See prediction_service.proto for gRPC request/response details.
data =
request = predict_pb2.PredictRequest() = 'inception'
request.model_spec.signature_name = ''
request.inputs['image'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=[1]))
result = stub.Predict(request, 5.0) # 10 secs timeout
print("Inception Client Passed")
if __name__ == '__main__':

Like I understood, there are 2 issues in your question.
A) Running tfserving on GPU.
B) Making a successfully grpc client request.
Let's start one-by-one.
Running tfserving on GPU
It is simple 2-step process.
Pulling latest image from the official docker hub page.
docker pull tensorflow/serving:latest-gpu
Please note the label latest-gpu in above pull request as it pulls image meant for GPU.
Running the docker container.
sudo docker run -p 8502:8500 --mount type=bind,source=/my_model_dir,target=/models/inception --name tfserve_gpu -e MODEL_NAME=inception --gpus device=3 -t tensorflow/serving:latest-gpu
Please note, I've passed argument --gpus device=3 to select the 3rd GPU device. Change it accordingly to select a different GPU device.
Verify, if the container has been started by docker ps command.
Also, verify if the gpu has been allocated for the tfserving docker by nvidia-smi command.
Output of nvidia-smi
But here seems a little problem. The tfserving docker has consumed all of gpu device memory.
To restrict gpu memory usage, use per_process_gpu_memory_fraction flag.
sudo docker run -p 8502:8500 --mount type=bind,source=/my_model_dir,target=/models/inception --name tfserve_gpu -e MODEL_NAME=inception --gpus device=3 -t tensorflow/serving:latest-gpu --per_process_gpu_memory_fraction=0.02
Output of nvidia-smi
Now, we have successfully configured tfserving docker on GPU device with reasonable gpu memory usage. Lets jump to the second problem.
Making GRPC client request
There is issue in formatting of your grpc client request. The tfserving docker image doesn't takes image in binary format directly, instead you'll have to make a tensor for that image and then pass it to the server.
Here's the code for making the grpc client request.
from __future__ import print_function
import argparse
import time
import numpy as np
from cv2 import imread
import grpc
from tensorflow.contrib.util import make_tensor_proto
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import tensorflow as tf
def read_tensor_from_image_file(file_name,
input_name = "file_reader"
output_name = "normalized"
file_reader =, input_name)
if file_name.endswith(".png"):
image_reader = tf.image.decode_png(
file_reader, channels=3, name="png_reader")
elif file_name.endswith(".gif"):
image_reader = tf.squeeze(
tf.image.decode_gif(file_reader, name="gif_reader"))
elif file_name.endswith(".bmp"):
image_reader = tf.image.decode_bmp(file_reader, name="bmp_reader")
image_reader = tf.image.decode_jpeg(
file_reader, channels=3, name="jpeg_reader")
float_caster = tf.cast(image_reader, tf.float32)
dims_expander = tf.expand_dims(float_caster, 0)
resized = tf.compat.v1.image.resize_bilinear(dims_expander, [input_height, input_width])
normalized = tf.divide(tf.subtract(resized, [input_mean]), [input_std])
sess = tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.01)))
result =
return result
def run(host, port, image, model, signature_name):
# Preparing tensor from the image
tensor = read_tensor_from_image_file(file_name='images/bird.jpg', input_height=224, input_width=224, input_mean=128, input_std=128)
# Preparing the channel
channel = grpc.insecure_channel('{host}:{port}'.format(host=host, port=port))
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# Preparing grpc request
request = predict_pb2.PredictRequest() = model
request.model_spec.signature_name = signature_name
request.inputs['image'].CopyFrom(make_tensor_proto(tensor, shape=[1, 224, 224, 3]))
# Making predict request
result = stub.Predict(request, 10.0)
# Analysing result to get the prediction output.
predictions = result.outputs['prediction'].float_val
print("Predictions : ", predictions)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--host', help='Tensorflow server host name', default='localhost', type=str)
parser.add_argument('--port', help='Tensorflow server port number', default=8502, type=int)
parser.add_argument('--image', help='input image', default='bird.jpg', type=str)
parser.add_argument('--model', help='model name', default='inception', type=str)
parser.add_argument('--signature_name', help='Signature name of saved TF model',
default='serving_default', type=str)
args = parser.parse_args()
run(, args.port, args.image, args.model, args.signature_name)
I'm not very sure whether this is the best way to make tfserving grpc client request (since tensorflow library is required at the client end to prepare the tensor) but it works for me.
Suggestions are welcomed if any.


how to make DataFlow from CSV in googleDrive to tf DataSet - in Colab

according to the instructions in Colab I could get buffer & even take a pd.DataFrame from it (file is just example)...
# ... authentification
file_id = '1S1w0Z7g3bI1PGLPR49PW5VBRo7c_KYgU' # titanic
# loading data
import io
from googleapiclient.http import MediaIoBaseDownload
drive_service = build('drive', 'v3') # , credentials=creds
request = drive_service.files().get_media(fileId=file_id)
buf = io.BytesIO()
downloader = MediaIoBaseDownload(buf, request)
import pandas as pd
df= pd.read_csv(buf);
But have trouble with correct creation of dataFlow to Dataset - "buf" var is not working in =>
dataset =,
batch_size=100, num_epochs=1)
only "csv_file_path" as 1st arg. Is it possible in Colab to get IO from my GoogleDrive's csv-file into Dataset (used further in training)? And how to do it in a memory-efficient manner?..
I understand that I perhaps can make file opened for all (in GoogleDrive) & get url to use the simple way:
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
dataset =, batch_size=100, num_epochs=1)
! but I DON'T need to share real file... How to save file confidential & get IO from it (in GoogleDrive) to in Colab ? (preferably the shortest code - there will be much more code in real project tested in Colab)
drive.CreateFile HELPED (link) - as so as I understand that working in Colab - I am working in a separate environment (separate from my PC & I'net env)... So I tried (according link)
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
link = ''
fluff, id = link.split('=')
print (id) # Verify that you have everything after '='
downloaded = drive.CreateFile({'id':id})
import tensorflow as tf
ds ='Filename.csv', batch_size=100, num_epochs=1)
iterator = ds.as_numpy_iterator()
it works for me. Thanks for the interest to the topic (if somebody tried)
even simplier
# Load the Drive helper and mount
from google.colab import drive
_types = [float(), float(), float(), float(), str()]
_lines ='/content/drive/My Drive/iris.csv')
ds=_lines.skip(1).map(lambda x:, record_defaults=_types) )
ds0= ds.take(2)
print(*ds0.as_numpy_iterator(), sep='\n') # print list with sep => by rows.
OR from df: (and batched for memory economical usage)
import tensorflow as tf
# Load the Drive helper and mount
from google.colab import drive
df= pd.read_csv('/content/drive/My Drive/iris.csv', dtype = 'float32', converters = {'variety' : str}, nrows=20, decimal='.')
ds = # if mixed types
ds = ds.shuffle(20, reshuffle_each_iteration=False ) # for train.ds ONLY!
ds = ds.batch(batch_size=4)
ds = ds.prefetch(4)
# labels
label= x: x['variety'])
# features
#features = x: (x['sepal.length'], x['sepal.width']))
# Or with dynamic keys:
features = x: (list(map(x.get, list(np.setdiff1d(list(x.keys()),['variety']))))))
with any Transformations in map...

How to prepare warmup request file for tensorflow serving?

Current version of tensorflow-serving try to load warmup request from assets.extra/tf_serving_warmup_requests file.
2018-08-16 16:05:28.513085: I tensorflow_serving/servables/tensorflow/] No warmup data file found at /tmp/faster_rcnn_inception_v2_coco_2018_01_28_string_input_version-export/1/assets.extra/tf_serving_warmup_requests
I wonder if tensorflow provides common api to export request to the location or not? Or should we write request to the location manually?
At this point there is no common API for exporting the warmup data into the assets.extra. It's relatively simple to write a script (similar to below):
import tensorflow as tf
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
def main():
with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:
request = predict_pb2.PredictRequest(
model_spec=model_pb2.ModelSpec(name="<add here>"),
inputs={"examples": tf.make_tensor_proto([<add here>])}
log = prediction_log_pb2.PredictionLog(
if __name__ == "__main__":
We refered to the official doc
Specially, we used Classification instead of Prediction, so we altered that code to be
log = prediction_log_pb2.PredictionLog(
This is a complete example of an object detection system using a ResNet model. The prediction consist of an image.
import tensorflow as tf
import requests
import base64
from tensorflow.python.framework import tensor_util
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
def get_image_bytes():
image_content = requests.get(IMAGE_URL, stream=True)
return image_content.content
def main():
"""Generate TFRecords for warming up."""
with"tf_serving_warmup_requests") as writer:
image_bytes = get_image_bytes()
predict_request = predict_pb2.PredictRequest() = 'resnet'
predict_request.model_spec.signature_name = 'serving_default'
tensor_util.make_tensor_proto([image_bytes], tf.string))
log = prediction_log_pb2.PredictionLog(
for r in range(NUM_RECORDS):
if __name__ == "__main__":
This script will create a file called “tf_serving_warmup_requests”
I moved this file to /your_model_location/resnet/1538687457/assets.extra/ and then restart my docker image to pickup the new changes.

Tensorflow profiling timeline inconsistent results

I want to profile my Tensorflow models using the timeline feature.
The issue is that I get inconsistent total runtime results.
Wall time
Apart from Tesnorflow timeline I am also using time.time() to meassure the runtime.
The first takes about: 1.6s
My second takes about: 0.03s
This makes sense as Tensorflow initializes some optimization related stuff first. But I can't see that in the timeline trace picture.
Each run takes approx. 0.3s instead of 1.6s and 0.03s.
I have loaded my graph before.
import os
import tempfile
import json
import tensorflow as tf
from tensorflow.python.client import timeline
import time
class TimeLiner:
_timeline_dict = None
def update_timeline(self, chrome_trace):
# convert crome trace to python dict
chrome_trace_dict = json.loads(chrome_trace)
# for first run store full trace
if self._timeline_dict is None:
self._timeline_dict = chrome_trace_dict
# for other - update only time consumption, not definitions
for event in chrome_trace_dict['traceEvents']:
# events time consumption started with 'ts' prefix
if 'ts' in event:
def save(self, f_name):
with open(f_name, 'w') as f:
json.dump(self._timeline_dict, f)
with tf.Session(graph=graph) as sess:
# add additional options to trace the session execution
options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
input = graph.get_tensor_by_name("inference/input_1_1:0")
output = graph.get_tensor_by_name("inference/output_node0:0")
many_runs_timeline = TimeLiner()
for i in range(2):
start_time = time.time()
print(len(,feed_dict={input: np.ones((1,256,640,3))}, options=options, run_metadata=run_metadata)))
stop_time = time.time()
print(stop_time - start_time)
# Create the Timeline object, and write it to a json file
fetched_timeline = timeline.Timeline(run_metadata.step_stats)
chrome_trace = fetched_timeline.generate_chrome_trace_format(show_memory=True,

Tensorboard: Export CSV file from command line

Could someone please tell me whether Tensorboard supports exporting CSV files from the command line? The reason why I ask this is because I have a lots of logging directory and I am hoping to have a script file that automates the process. Thanks.
The API supports reading files programmatically. Here's an example of extracting data for a tag and saving it to a .csv in a similar format to those generated by tensorboard
import argparse
import numpy as np
import tensorflow as tf
def save_tag_to_csv(fn, tag='test_metric', output_fn=None):
if output_fn is None:
output_fn = '{}.csv'.format(tag.replace('/', '_'))
print("Will save to {}".format(output_fn))
sess = tf.InteractiveSession()
wall_step_values = []
with sess.as_default():
for e in tf.train.summary_iterator(fn):
for v in e.summary.value:
if v.tag == tag:
wall_step_values.append((e.wall_time, e.step, v.simple_value))
np.savetxt(output_fn, wall_step_values, delimiter=',', fmt='%10.5f')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--tag', default='test_metric')
args = parser.parse_args()
save_tag_to_csv(args.fn, tag=args.tag)

Running distributed training with tf.learn

I've been trying to set up a distributed cluster running the Boston Housing example mentioned in the TensorFlow tutorial but so far I'm a bit lost. Googling or searching in the tutorials was no help.
"""DNNRegressor with custom input_fn for Housing dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import json
import os
import pandas as pd
import tensorflow as tf
COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
"dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
"age", "dis", "tax", "ptratio"]
LABEL = "medv"
def input_fn(data_set):
feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
labels = tf.constant(data_set[LABEL].values)
return feature_cols, labels
def main(unused_argv):
# Load datasets
training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Set of 6 examples for which to predict median house values
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Feature cols
feature_cols = [tf.contrib.layers.real_valued_column(k)
for k in FEATURES]
cluster = {'ps': ['', ''],
'worker': ['', '']}
os.environ['TF_CONFIG'] = json.dumps(
{'cluster': cluster,
'task': {'type': 'worker', 'index': 0}})
# Build 2 layer fully connected DNN with 10, 10 units respectively.
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
hidden_units=[10, 10],
# Fit input_fn(training_set), steps=5000)
# Score accuracy
ev = regressor.evaluate(input_fn=lambda: input_fn(test_set), steps=1)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
# Print out predictions
y = regressor.predict(input_fn=lambda: input_fn(prediction_set))
# .predict() returns an iterator; convert to a list and print predictions
predictions = list(itertools.islice(y, 6))
print("Predictions: {}".format(str(predictions)))
if __name__ == "__main__":
I'm not sure if I've set up TF_CONFIG correctly here. I used a cluster of 4 machines - two PSs and two workers but I didn't set 'environment' in cluster nor 'master' machines. I first started two PSs running, and then when I ran two workers, it was stuck right after "INFO:tensorflow:Create CheckpointSaverHook." Did I do anything wrong here?
I appreciate your help.
I had the exact same problem. The issue is that the grpc server never actually gets started. I made the same assumption you did - that tf.learn starts the grpc server - but it does not. You can start a server from inside your python script. Then, depending on if the process is running a 'ps' or 'worker' task, you either call server.join() or run the rest of your model's code:
job = sys.argv[1]
task = int(sys.argv[2])
cluster = {'worker': ['localhost:2223'],
'ps': ['localhost:2222']}
os.environ['TF_CONFIG'] = json.dumps({'cluster': cluster,
'task': {'type': job, 'index': task}})
# Create the server
server = tf.train.Server(cluster,
if job == "ps":
elif job == "worker":
# Load input
For more information, checkout:
