I have a TensorFlow Keras model which is stored in .pb format and from .pb format I am converting the model to .onnx format using the tf2onnx model
!python -m tf2onnx.convert --saved-model model.pb --output model.onnx
now after converting I see that my input layer is in NHWC format and I need to convert the same to NCHW, to achieve that I am using
!python -m tf2onnx.convert --saved-model model.pb --output model_3.onnx --inputs-as-nchw input0:0
which is still giving me the same output as NHWC
I have to consume the above model in NVIDIA Deepstream which only accepts NCHW format.
I found this link which talks about the transpose of the input layer, but unfortunately, that is also not working.
Convert between NHWC and NCHW in TensorFlow
#import tensorflow as tf
images_nhwc = tf.compat.v1.placeholder(tf.float32, [1, 200, 300, 3])
# input batch
out = tf.transpose(images_nhwc, [0, 3, 1, 2])
#print(out.get_shape())
model.build(out.get_shape())
It would be really helpful if some experts can share their thoughts on how to convert NHWC to NCHW
I found the solution.
I had to take the latest code of tf2onnx.convert.from_keras. I took the main branch from tf2onnx
!pip install --force-reinstall git+https://github.com/onnx/tensorflow-onnx.git#main
!pip show tf2onnx
!pip freeze | grep tf2onnx
once that was done I was able to load the latest functionality and updated code at
https://github.com/onnx/tensorflow-onnx/tree/e896723e410a59a600d1a73657f9965a3cbf2c3b .
Below is the code I used to convert my model from .pb to .onnx along with NHWC to NCHW.
# give the list of *inputs* which should be converted and returned *as nchw*
_INPUT = model.input.name
model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model, inputs_as_nchw=[_INPUT])
The biggest catch about the above code was [_INPUT] which was suppose to be a list and I was able find this information in the test cases.
Related
I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected.
Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.
Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.
Here is my code for optimizing tensorflow saved model using TensorRt:
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()
def my_input_fn():
inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir) # Generated engines will be saved.
print("------------------------freezing the graph---------------------")
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
saved_model_loaded = tf.saved_model.load(
output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
graph_func)
frozen_func.graph.as_graph_def()
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./",
name="unet_frozen_graphTensorRt.pb",
as_text=False)
I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model.
How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPSMaybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks
As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).
The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)
There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.
Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.
Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:
TF -> UFF -> TRT: jkjung-avt/tensorrt_demos, NVIDIA-AI-IOT/tf_to_trt_image_classification (the latter using a bit of C++)
TF -> ONNX -> TRT: tensorflow-onnx, onnx-tensorrt
Keras -> ONNX -> TRT: Nvidia blog post (This one mentions converting a Unet to TRT!)
Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)
Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:
PyTorch -> ONNX -> TRT (used by dusty_nv)
PyTorch -> TRT (direct conversion via torch2trt). It seems that quite a few Nvidia repositories use this.
Hi can you share the errors you are getting? Its should work with the following steps:
Convert the TensorFlow/Keras model to a .pb file.
Convert the .pb file to ONNX format.
Create a TensorRT engine.
Run inference from the TensorRT engine.
I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).
Here is an example with Resnet-50.
Conversion to .pb:
import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)
def keras_to_pb(model, output_filename, output_node_names):
"""
This is the function to convert the Keras model to pb.
Args:
model: The Keras model.
output_filename: The output .pb file name.
output_node_names: The output nodes of the network. If None, then
the function gets the last layer name as the output node.
"""
# Get the names of the input and output nodes.
in_name = model.layers[0].get_output_at(0).name.split(':')[0]
if output_node_names is None:
output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]
sess = keras.backend.get_session()
# The TensorFlow freeze_graph expects a comma-separated string of output node names.
output_node_names_tf = ','.join(output_node_names)
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
sess.close()
wkdir = ''
tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)
return in_name, output_node_names
# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None)
Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx.
Example:
python -m tf2onnx.convert --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx
Last step create the TensorRT engine from the ONNX file:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):
"""
This is the function to create the TensorRT engine
Args:
onnx_path : Path to onnx_file.
shape : Shape of the input of the ONNX file.
"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = (256 << 20)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
network.get_input(0).shape = shape
engine = builder.build_cuda_engine(network)
return engine
def save_engine(engine, file_name):
buf = engine.serialize()
with open(file_name, 'wb') as f:
f.write(buf)
def load_engine(trt_runtime, plan_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I suggest you check this Pytorch TRT Unet implementation
I have downloaded a pre-trained PoseNet model for Tensorflow.js (tfjs) from Google, so its a json file.
However, I want to use it on Android, so I need the .tflite model. Although someone has 'ported' a similar model from tfjs to tflite here, I have no idea what model (there are many variants of PoseNet) they converted. I want to do the steps myself. Also, I don't want to run some arbitrary code someone uploaded into a file in stackOverflow:
Caution: Be careful with untrusted code—TensorFlow models are code. See Using TensorFlow Securely for details. Tensorflow docs
Does anyone know any convenient ways to do this?
You can find out what tfjs format you have by looking in the json file. It often says "graph-model". The difference between them are here.
From tfjs graph model to SavedModel (more common)
Use tfjs-to-tf by Patrick Levin.
import tfjs_graph_converter.api as tfjs
tfjs.graph_model_to_saved_model(
"savedmodel/posenet/mobilenet/float/050/model-stride16.json",
"realsavedmodel"
)
# Code below taken from https://www.tensorflow.org/lite/convert/python_api
converter = tf.lite.TFLiteConverter.from_saved_model("realsavedmodel")
tflite_model = converter.convert()
# Save the TF Lite model.
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
From tfjs layers model to SavedModel
Note: This will only work for layers model format, not graph model format as in the question. I've written the difference between them here.
Install and use tensorflowjs-convert to convert the .json file into a Keras HDF5 file (from another SO thread).
On mac, you'll face issues running pyenv (fix) and on Z-shell, pyenv won't load correctly (fix). Also, once pyenv is running, use python -m pip install tensorflowjs instead of pip install tensorflowjs, because pyenv did not change python used by pip for me.
Once you've followed the tensorflowjs_converter guide, run tensorflowjs_converter to verify it works with no errors, and should just warn you about Missing input_path argument. Then:
tensorflowjs_converter --input_format=tfjs_layers_model --output_format=keras tfjs_model.json hdf5_keras_model.hdf5
Convert the Keras HDF5 file into a SavedModel (standard Tensorflow model file) or directly into .tflite file using the TFLiteConverter. The following runs in a Python file:
# Convert the model.
model = tf.keras.models.load_model('hdf5_keras_model.hdf5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the TF Lite model.
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
or to save to a SavedModel:
# Convert the model.
model = tf.keras.models.load_model('hdf5_keras_model.hdf5')
tf.keras.models.save_model(
model, filepath, overwrite=True, include_optimizer=True, save_format=None,
signatures=None, options=None
)
I created a SavedModel using the Universal Sentence Encoder Lite version. If I load the SavedModel using tf.saved_model.loader.load, it works perfectly fine.
However, if I try to serve the model using Tensorflow Serving, I'm getting the following error:
"error": "indices[3] = 1 is not in [0, 1)\n\t [[Node:
lite_module_apply_default/Encoder_en/KonaTransformer/ClipToMaxLength/GatherV2_1
= GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_INT64, _output_shapes=[[?]], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_qsph_v_0_7,
lite_module_apply_default/Encoder_en/KonaTransformer/ClipToMaxLength/Reshape,
lite_module_apply_default_1/Encoder_en/KonaTransformer/SequenceMask/Const)]]"
Any reasons why that would be occurring?
python version: 3.6
tensorflow version: 1.9.0
tensorflow_hub version: 0.1.1
Using tensorflow/serving docker 1.11
I was giving the input tensors in row format. By changing the format of the input tensors to columnar format, I was able to rectify the issue. A detailed description of row and columnar formats can be found here.
I get the pre-trained .pb file of MobileNet and find it's not quantized while the fully quantized model should be converted into .tflite format. Since I'm not familiar with tools for mobile app developing, how can I get the fully quantized weights of MobileNet from .tflite file. More precisely, how can I extract quantized parameters and view its numerical values ?
The Netron model viewer has nice view and export of data, as well as a nice network diagram view.
https://github.com/lutzroeder/netron
I'm also in the process of studying how TFLite works. What I found may not be the best approach and I would appreciate any expert opinions. Here's what I found so far using flatbuffer python API.
First you'll need to compile the schema with flatbuffer. The output will be a folder called tflite.
flatc --python tensorflow/contrib/lite/schema/schema.fbs
Then you can load the model and get the tensor you want. Tensor has a method called Buffer() which is, according to the schema,
An index that refers to the buffers table at the root of the model.
So it points you to the location of the data.
from tflite import Model
buf = open('/path/to/mode.tflite', 'rb').read()
model = Model.Model.GetRootAsModel(buf, 0)
subgraph = model.Subgraphs(0)
# Check tensor.Name() to find the tensor_idx you want
tensor = subgraph.Tensors(tensor_idx)
buffer_idx = tensor.Buffer()
buffer = model.Buffers(buffer_idx)
After that you'll be able to read the data by calling buffer.Data()
Reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/schema/schema.fbs
https://github.com/google/flatbuffers/tree/master/samples
Using TensorFlow 2.0, you can extract the weights and some information regarding the tensor (shape, dtype, name, quantization) with the following script - inspired from TensorFlow documentation
import tensorflow as tf
import h5py
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="v3-large_224_1.0_uint8.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# get details for each layer
all_layers_details = interpreter.get_tensor_details()
f = h5py.File("mobilenet_v3_weights_infos.hdf5", "w")
for layer in all_layers_details:
# to create a group in an hdf5 file
grp = f.create_group(str(layer['index']))
# to store layer's metadata in group's metadata
grp.attrs["name"] = layer['name']
grp.attrs["shape"] = layer['shape']
# grp.attrs["dtype"] = all_layers_details[i]['dtype']
grp.attrs["quantization"] = layer['quantization']
# to store the weights in a dataset
grp.create_dataset("weights", data=interpreter.get_tensor(layer['index']))
f.close()
You can view it using Netron app
macOS: Download the .dmg file or run brew install netron
Linux: Download the .AppImage file or run snap install netron
Windows: Download the .exe installer or run winget install netron
Browser: Start the browser version.
Python Server: Run pip install netron and netron [FILE] or netron.start('[FILE]').
OS Platform and Distribution: Linux Ubuntu 14.04
TensorFlow version: tensorflow (1.4.0) from binary,
CUDA/cuDNN version: cuda 8.0
I have trained a customized model with tensorflow and I am trying to make it a tensorflow lite model for mobile apps. My model defines like:
def P_Net(inputs,label=None,bbox_target=None,landmark_target=None,training=True):
#define common param
with slim.arg_scope([slim.conv2d],
activation_fn=prelu,
weights_initializer=slim.xavier_initializer(),
biases_initializer=tf.zeros_initializer(),
weights_regularizer=slim.l2_regularizer(0.0005),
padding='valid'):
print inputs.get_shape()
net = slim.conv2d(inputs, 28, 3, stride=1,scope='conv1')
......
conv4_1 = slim.conv2d(net,num_outputs=2,kernel_size=[1,1],stride=1,scope='conv4_1',activation_fn=tf.nn.softmax)
#conv4_1 = slim.conv2d(net,num_outputs=1,kernel_size=[1,1],stride=1,scope='conv4_1',activation_fn=tf.nn.sigmoid)
print conv4_1.get_shape()
#batch*H*W*4
bbox_pred = slim.conv2d(net,num_outputs=4,kernel_size=[1,1],stride=1,scope='conv4_2',activation_fn=None)
print bbox_pred.get_shape()
where conv4_1 and conv4_2 is the output layer.
I freeze the model with:
freeze_graph.freeze_graph('out_put_model/model.pb', '', False, model_path, 'Squeeze,Squeeze_1', '', '', 'out_put_model/frozen_model.pb', '', '')
After that, I could use tensorboard to view graphs. When I read it back to double check it, I get identical info to the checkpoint model.
Then, I try to save the frozen_model.pb to tensorflow lite model. While tensorflow 1.4.0 doesn't have tensorflow lite module, I checkout tensorflow from github and bazel run toco like:
bazel run --config=opt //tensorflow/contrib/lite/toco:toco -- --input_file='/home/sens/mtcnn_cat/MTCNN-Tensorflow/test/out_put_model/frozen_model.pb' --output_file='/home/sens/mtcnn_cat/MTCNN-Tensorflow/test/out_put_model/pnet.tflite' --inference_type=FLOAT --input_shape=1,128,128,3 --input_array=image_height,image_width,input_image --output_array=Squeeze,Squeeze_1 --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --dump_graphviz=/tmp
However, I get error about output array not found:
INFO: Running command line: bazel-bin/tensorflow/contrib/lite/toco/toco '--input_file=/home/sens/mtcnn_cat/MTCNN-Tensorflow/test/out_put_model/frozen_model.pb' '--output_file=/home/sens/mtcnn_cat/MTCNN-Tensorflow/test/out_put_model/pnet.tflite' '--inference_type=FLOAT' '--input_shape=1,128,128,3' '--input_array=image_height,image_width,input_image' '--output_array=Squeeze,Squeeze_1' '--input_format=TENSORFLOW_GRAPHDEF' '--output_format=TFLITE' '--dump_graphviz=/tmp'
2018-04-03 11:17:37.412589: I tensorflow/contrib/lite/toco/import_tensorflow.cc:1172] Converting unsupported operation: Abs
2018-04-03 11:17:37.412660: I tensorflow/contrib/lite/toco/import_tensorflow.cc:1172] Converting unsupported operation: Abs
2018-04-03 11:17:37.412699: I tensorflow/contrib/lite/toco/import_tensorflow.cc:1172] Converting unsupported operation: Abs
2018-04-03 11:17:37.412880: F tensorflow/contrib/lite/toco/tooling_util.cc:686] Check failed: model.HasArray(output_array) Output array not found: Squeeze,Squeeze_1
Question:
How to set the --output_array=Squeeze,Squeeze_1 parameter? I think it's the same as output nodes in freeze_graph() in tensorboard, I do find the "Squeeze" and "Squeeze_1" node
How to set the --input_shape=1,128,128,3 --input_array=image_height,image_width,input_image parameter? I check and find the mobile do have a fixed size image input, but in my model, there's not fixed size of input image and fully convolution input like:
self.image_op = tf.placeholder(tf.float32, name='input_image')
self.width_op = tf.placeholder(tf.int32, name='image_width')
self.height_op = tf.placeholder(tf.int32, name='image_height')
image_reshape = tf.reshape(self.image_op, [1, self.height_op, self.width_op, 3])
and a reshape to 1widthheight*3
So, how to write this as input shape?
For Input array
[node.op.name for node in model.inputs]
For Output array
[node.op.name for node in model.outputs]
converting frozen model to tf_lite has never been a easy job, thanks to tensorflow. hope this code can help you summarize the graph and help you find output and input array
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph={PATH_TO_FROZEN_GRAPH}/optimized_best.pb`
I came across this issue when trying to retrain and then convert to tflite.
This is the solution that worked for me:
With 1.9 and above (and possibly 1.8 too, haven't tested.) you need to drop the --input_format field and change the --input_file param to --graph_def_file
So you end up with a command that looks a bit like:
toco \
--graph_def_file=tf_files/retrained_graph.pb \
--output_file=tf_files/optimized_graph.lite \
--output_format=TFLITE \
--input_shape=1,${IMAGE_SIZE},${IMAGE_SIZE},3 \
--input_array=input \
--output_array=final_result \
--inference_type=FLOAT \
--inference_input_type=FLOAT
I was then able to complete the poets example and get my tflite file to work on android.
Source:
https://github.com/googlecodelabs/tensorflow-for-poets-2/issues/68
You can use below tool to determine input and output array and model size along with other parameter for tflite conversion. This also creates a nice visualization of tensorflow frozen graph.
Github link for the tool