How to visualize feature maps of a TensorFlow Lite model? - tensorflow

I used Keract to visualize the feature maps of a TensorFlow/Keras model.
I have applied quantization with TensorFlow Lite. I would like to visualize the feature maps generated by the TensorFlow Lite model during an inference. Do you know a way to do this?
The reason is that I don't fully understand the interaction between weights, activations and scale/zero-point coefficients. So I would like to do the inference process step by step for a quantized network.
Thank you for your help

There are several ways to extract information about weights, scales and zero-point values.
way one:
You can also find additional information about the below code from the TensorFlow website.
import tensorflow as tf
import numpy as np
#Load your TFLite model.
TF_LITE_MODEL_FILE_NAME = "Your_TFLite_file.tflite"
interpreter = tf.lite.Interpreter(model_path=TF_LITE_MODEL_FILE_NAME)
#Gives you input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
#Gives you all tensor details index-wise. you will find all the quantization_parameters here
interpreter.get_tensor_details()
#get individual tensor value. interpreter.get_tensor(give_index_number). You will find the index for individual tensor index from get_tensor_details
interpreter.allocate_tensors()
r= interpreter.get_tensor(12).astype(np.float32)
print('Tensors', r)
Way two (the easy way):
Upload your TFLite file on the Netron Website. There you can get lots of information about your TFlite file. You can install Netron on your PC also. Here, is the Netron git link to install Netron on your PC.

Related

Jetson NX optimize tensorflow model using TensorRT

I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected.
Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.
Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.
Here is my code for optimizing tensorflow saved model using TensorRt:
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()
def my_input_fn():
inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir) # Generated engines will be saved.
print("------------------------freezing the graph---------------------")
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
saved_model_loaded = tf.saved_model.load(
output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
graph_func)
frozen_func.graph.as_graph_def()
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./",
name="unet_frozen_graphTensorRt.pb",
as_text=False)
I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model.
How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPSMaybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks
As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).
The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)
There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.
Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.
Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:
TF -> UFF -> TRT: jkjung-avt/tensorrt_demos, NVIDIA-AI-IOT/tf_to_trt_image_classification (the latter using a bit of C++)
TF -> ONNX -> TRT: tensorflow-onnx, onnx-tensorrt
Keras -> ONNX -> TRT: Nvidia blog post (This one mentions converting a Unet to TRT!)
Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)
Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:
PyTorch -> ONNX -> TRT (used by dusty_nv)
PyTorch -> TRT (direct conversion via torch2trt). It seems that quite a few Nvidia repositories use this.
Hi can you share the errors you are getting? Its should work with the following steps:
Convert the TensorFlow/Keras model to a .pb file.
Convert the .pb file to ONNX format.
Create a TensorRT engine.
Run inference from the TensorRT engine.
I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).
Here is an example with Resnet-50.
Conversion to .pb:
import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)
def keras_to_pb(model, output_filename, output_node_names):
"""
This is the function to convert the Keras model to pb.
Args:
model: The Keras model.
output_filename: The output .pb file name.
output_node_names: The output nodes of the network. If None, then
the function gets the last layer name as the output node.
"""
# Get the names of the input and output nodes.
in_name = model.layers[0].get_output_at(0).name.split(':')[0]
if output_node_names is None:
output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]
sess = keras.backend.get_session()
# The TensorFlow freeze_graph expects a comma-separated string of output node names.
output_node_names_tf = ','.join(output_node_names)
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
sess.close()
wkdir = ''
tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)
return in_name, output_node_names
# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None)
Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx.
Example:
python -m tf2onnx.convert --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx
Last step create the TensorRT engine from the ONNX file:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):
"""
This is the function to create the TensorRT engine
Args:
onnx_path : Path to onnx_file.
shape : Shape of the input of the ONNX file.
"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = (256 << 20)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
network.get_input(0).shape = shape
engine = builder.build_cuda_engine(network)
return engine
def save_engine(engine, file_name):
buf = engine.serialize()
with open(file_name, 'wb') as f:
f.write(buf)
def load_engine(trt_runtime, plan_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I suggest you check this Pytorch TRT Unet implementation

Save trained gensim word2vec model as a tensorflow SavedModel

Do we have an option to save a trained Gensim Word2Vec model as a saved model using tf 2.0 tf.saved_model.save? In other words, how can I save a trained embedding vector as a saved model signature to work with tensorflow 2.0. The following steps are not correct normally:
model = gensim.models.Word2Vec(...)
model.init_sims(..)
model.train(..)
model.save(..)
module = gensim.models.KeyedVectors.load_word2vec(...)
tf.saved_model.save(
module,
export_dir
)
EDIT:
This example helped me about how to do it : https://keras.io/examples/nlp/pretrained_word_embeddings/
Gensim does not use TensorFlow and it has its own methods for loading and saving models.
You would need to convert Gensim embeddings into a TensorFlow a model which only makes sense if you further plan to use your embeddings within TensorFlow and possibly fine-tune them for your task.
Gensim Word2Vec are two steps in TensorFlow:
Vocabulary lookup: a table that assigns indices to tokens.
Embedding lookup layer that picks up the actual embeddings for the indices.
Then, you can save it as any other TensorFlow model.

Is there any way to convert a tensorflow lite (.tflite) file back to a keras file (.h5)?

I had lost my dataset by a careless mistake. I have only my tflite file left in my hand. Is there any solution to reverse back h5 file. I have been done decent research in this but no solutions found.
The conversion from a TensorFlow SaveModel or tf.keras H5 model to .tflite is an irreversible process. Specifically, the original model topology is optimized during the compilation by the TFLite converter, which leads to some loss of information. Also, the original tf.keras model's loss and optimizer configurations are discarded, because those aren't required for inference.
However, the .tflite file still contains some information that can help you restore the original trained model. Most importantly, the weight values are available, although they might be quantized, which could lead to some loss in precision.
The code example below shows you how to read weight values from a .tflite file after it's created from a simple trained tf.keras.Model.
import numpy as np
import tensorflow as tf
# First, create and train a dummy model for demonstration purposes.
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=[5], activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid")])
model.compile(loss="binary_crossentropy", optimizer="sgd")
xs = np.ones([8, 5])
ys = np.zeros([8, 1])
model.fit(xs, ys, epochs=1)
# Convert it to a TFLite model file.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
open("converted.tflite", "wb").write(tflite_model)
# Use `tf.lite.Interpreter` to load the written .tflite back from the file system.
interpreter = tf.lite.Interpreter(model_path="converted.tflite")
all_tensor_details = interpreter.get_tensor_details()
interpreter.allocate_tensors()
for tensor_item in all_tensor_details:
print("Weight %s:" % tensor_item["name"])
print(interpreter.tensor(tensor_item["index"])())
These weight values loaded back from the .tflite file can be used with tf.keras.Model.set_weights() method, which will allow you to re-inject the weight values into a new instance of trainable Model that you have in Python. Obviously, this requires you to still have access to the code that defines the model's architecture.

How can I view weights in a .tflite file?

I get the pre-trained .pb file of MobileNet and find it's not quantized while the fully quantized model should be converted into .tflite format. Since I'm not familiar with tools for mobile app developing, how can I get the fully quantized weights of MobileNet from .tflite file. More precisely, how can I extract quantized parameters and view its numerical values ?
The Netron model viewer has nice view and export of data, as well as a nice network diagram view.
https://github.com/lutzroeder/netron
I'm also in the process of studying how TFLite works. What I found may not be the best approach and I would appreciate any expert opinions. Here's what I found so far using flatbuffer python API.
First you'll need to compile the schema with flatbuffer. The output will be a folder called tflite.
flatc --python tensorflow/contrib/lite/schema/schema.fbs
Then you can load the model and get the tensor you want. Tensor has a method called Buffer() which is, according to the schema,
An index that refers to the buffers table at the root of the model.
So it points you to the location of the data.
from tflite import Model
buf = open('/path/to/mode.tflite', 'rb').read()
model = Model.Model.GetRootAsModel(buf, 0)
subgraph = model.Subgraphs(0)
# Check tensor.Name() to find the tensor_idx you want
tensor = subgraph.Tensors(tensor_idx)
buffer_idx = tensor.Buffer()
buffer = model.Buffers(buffer_idx)
After that you'll be able to read the data by calling buffer.Data()
Reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/schema/schema.fbs
https://github.com/google/flatbuffers/tree/master/samples
Using TensorFlow 2.0, you can extract the weights and some information regarding the tensor (shape, dtype, name, quantization) with the following script - inspired from TensorFlow documentation
import tensorflow as tf
import h5py
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="v3-large_224_1.0_uint8.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# get details for each layer
all_layers_details = interpreter.get_tensor_details()
f = h5py.File("mobilenet_v3_weights_infos.hdf5", "w")
for layer in all_layers_details:
# to create a group in an hdf5 file
grp = f.create_group(str(layer['index']))
# to store layer's metadata in group's metadata
grp.attrs["name"] = layer['name']
grp.attrs["shape"] = layer['shape']
# grp.attrs["dtype"] = all_layers_details[i]['dtype']
grp.attrs["quantization"] = layer['quantization']
# to store the weights in a dataset
grp.create_dataset("weights", data=interpreter.get_tensor(layer['index']))
f.close()
You can view it using Netron app
macOS: Download the .dmg file or run brew install netron
Linux: Download the .AppImage file or run snap install netron
Windows: Download the .exe installer or run winget install netron
Browser: Start the browser version.
Python Server: Run pip install netron and netron [FILE] or netron.start('[FILE]').

Display Tensorflow Model Summary as like in Keras

We can build the model with tensorflow layers. Is there any way we can display the model summary as like in Keras.
Keras Model Summary
No, there is no such option. TensorFlow is a lot more generic than Keras and allows arbitrary graph architectures, so showing such a structured summary does not make sense for arbitrary TensorFlow graphs. The closest is probably TensorBoard, which has a very handy interactive graph visualization tool.
Keras is part of TensorFlow (for some time) so you can always get nice things like:
model.output_shape # model summary representation
model.summary() # model configuration
model.get_config() # list all weight tensors in the model
model.get_weights() # get weights and biases