Beam Search Decoder Tensorflow 2.0 - tensorflow

I am looking to implement a sequence to sequence neural net with attention and beam search in Tensorflow 2.0 alpha. While the tutorials on their website have been very useful, I am having trouble figuring out the best way to implement beam search since the contrib library is deprecated - can anyone point me in the right direction?
I tried to use TF2.0s upgrade script to upgrade my tensorflow 1.X beam search to 2.0, but it does not support the contrib library.
This is how the beam search code looked for 1.x
decoder = tf.contrib.seq2seq.BeamSearchDecoder(
cell=decoder_cell,
embedding=self.embeddings,
start_tokens=tf.fill([self.batch_size], tf.constant(2)),
end_token=tf.constant(3),
initial_state=initial_state,
beam_width=self.beam_width,
output_layer=self.projection_layer
)
outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(
decoder, output_time_major=True, maximum_iterations=summary_max_len, scope=decoder_scope)
self.prediction = tf.transpose(outputs.predicted_ids, perm=[1, 2, 0])

Few of the Tensorflow 1.x APIs are moved to different APIs in Tensorflow 2.x.
Tf.contrib is one such library which partly moved to Tensorflow addons.
For the tf.contrib.seq2seq.BeamSearchDecoder is moved to tfa.seq2seq.BeamSearchDecoder in TFv2.x.
tfa.seq2seq.BeamSearchDecoder(
cell: tf.keras.layers.Layer,
beam_width: int,
embedding_fn: Optional[Callable] = None,
output_layer: Optional[tf.keras.layers.Layer] = None,
length_penalty_weight: tfa.types.FloatTensorLike = 0.0,
coverage_penalty_weight: tfa.types.FloatTensorLike = 0.0,
reorder_tensor_arrays: bool = True,
**kwargs
)

Related

Jetson NX optimize tensorflow model using TensorRT

I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected.
Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.
Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.
Here is my code for optimizing tensorflow saved model using TensorRt:
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()
def my_input_fn():
inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir) # Generated engines will be saved.
print("------------------------freezing the graph---------------------")
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
saved_model_loaded = tf.saved_model.load(
output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
graph_func)
frozen_func.graph.as_graph_def()
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./",
name="unet_frozen_graphTensorRt.pb",
as_text=False)
I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model.
How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPSMaybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks
As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).
The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)
There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.
Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.
Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:
TF -> UFF -> TRT: jkjung-avt/tensorrt_demos, NVIDIA-AI-IOT/tf_to_trt_image_classification (the latter using a bit of C++)
TF -> ONNX -> TRT: tensorflow-onnx, onnx-tensorrt
Keras -> ONNX -> TRT: Nvidia blog post (This one mentions converting a Unet to TRT!)
Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)
Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:
PyTorch -> ONNX -> TRT (used by dusty_nv)
PyTorch -> TRT (direct conversion via torch2trt). It seems that quite a few Nvidia repositories use this.
Hi can you share the errors you are getting? Its should work with the following steps:
Convert the TensorFlow/Keras model to a .pb file.
Convert the .pb file to ONNX format.
Create a TensorRT engine.
Run inference from the TensorRT engine.
I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).
Here is an example with Resnet-50.
Conversion to .pb:
import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)
def keras_to_pb(model, output_filename, output_node_names):
"""
This is the function to convert the Keras model to pb.
Args:
model: The Keras model.
output_filename: The output .pb file name.
output_node_names: The output nodes of the network. If None, then
the function gets the last layer name as the output node.
"""
# Get the names of the input and output nodes.
in_name = model.layers[0].get_output_at(0).name.split(':')[0]
if output_node_names is None:
output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]
sess = keras.backend.get_session()
# The TensorFlow freeze_graph expects a comma-separated string of output node names.
output_node_names_tf = ','.join(output_node_names)
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
sess.close()
wkdir = ''
tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)
return in_name, output_node_names
# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None)
Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx.
Example:
python -m tf2onnx.convert --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx
Last step create the TensorRT engine from the ONNX file:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):
"""
This is the function to create the TensorRT engine
Args:
onnx_path : Path to onnx_file.
shape : Shape of the input of the ONNX file.
"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = (256 << 20)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
network.get_input(0).shape = shape
engine = builder.build_cuda_engine(network)
return engine
def save_engine(engine, file_name):
buf = engine.serialize()
with open(file_name, 'wb') as f:
f.write(buf)
def load_engine(trt_runtime, plan_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I suggest you check this Pytorch TRT Unet implementation

How tensorflow lite use bicubic?

I'm using Tensorflow lite for Android, official Python code uses the following code before using TF.lite.Interpreter, but this code uses the TensorFlow module, which is not available in Android Java. How should TensorFlow Lite implement the bicubic Resize method?
img_resized = tf.image.resize(img, [width, height], method='bicubic', preserve_aspect_ratio=False)
img_input = img_resized.numpy()
reshape_img = img_input.reshape(1, width, height, 3)
tensor = tf.convert_to_tensor(reshape_img, dtype=tf.float32)
# load model
print("Load model...")
interpreter = tf.lite.Interpreter(model_path=model_name)
official Python code use model.tflite, I only found these two methods in Android TensorFlow Lite, not bicubic:
ResizeOp.ResizeMethod.BILINEAR
ResizeOp.ResizeMethod.NEAREST_NEIGHBOR
TFLite does not have bicubic resize. Three options here:
If part of preprocessing - do it on java side
Try to use selected TF ops: add converter.target_spec.supported_ops = [ tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops. tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops. ] and add implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly' to your build.gradle. Details
If 1 and 2 do not work - you will be offered by friendly community to implement yourself

Yolov3 to Tensorrt : Custom Plugins for tf-keras Lambda Layers

I have trained a yolov3-tiny model in Tensorflow 2.0 using this repo : https://github.com/zzh8829/yolov3-tf2
On inference, the model uses two functions wrapped in tf-keras lambda layers for postprocessing, these are :
yolo_boxes : to calculate actual box coordinates from the offsets outputted by the model
yolo_nms : do nonmax-suppression using tf.image.combined_non_max_suppression
boxes_0 = Lambda(lambda x: yolo_boxes(x, anchors[masks[0]], classes),name='yolo_boxes_0')(output_0)
boxes_1 = Lambda(lambda x: yolo_boxes(x, anchors[masks[1]], classes),name='yolo_boxes_1')(output_1)
outputs = Lambda(lambda x: yolo_nms(x, anchors, masks, classes),name='yolo_nms')((boxes_0[:3], boxes_1[:3]))
I have created a frozen pb of this inference model, and converted it to ONNX. But I cannot figure out how to proceed.
How do I create a python Tensorrt plugin for yolo_boxes? I cannot find any material online for Lambda layer plugins, and cannot test tensorrts custom NMS plugin without the yolo_boxes plugin first.
I have done this in the past for yoloV3 in two ways: (Both also work for yolov4 and yolov3-tiny):
https://github.com/jkjung-avt/tensorrt_demos
https://github.com/Tianxiaomo/pytorch-YOLOv4
They both first convert to ONNX and then to TensorRT.
For the second link you will need Pytorch.
Note that the right versions of ONNX and TensorRT are required to make this work. Old versions of ONNX do not have the right opset to work. But this information can all be found on those two links.
Thank you for your help #joostblack.
Referring to your resources and a few others, I was able to use BatchedNMSDynamic_TRT plugin to solve my problem. I simply made the input to the combined_non_max_suppression function (the one within yolo_nms) as the output of the model, and used graphsurgeon to append the batchedNMSDynamic_TRT plugin as a node, giving the outputs of the model as inputs to the plugin.

How to Implement Scheduled Sampling in Keras?

How can I implement the concept of Scheduled Sampling while training an LSTM Encoder-Decoder with teacher forcing in Keras?
The source paper is here: https://arxiv.org/abs/1506.03099
From this post, I see that pure Tensor-flow supports this training helper:
scheduled sampling in Tensorflow
https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/ScheduledOutputTrainingHelper
Is there a way to do this in Keras, maybe using a custom backend function?
Tensorflow contrib API is deprecated in Tensorflow2.x version. Few libraries of tf.contrib are moved to Tensorflow addons.
Use tfa.seq2seq.ScheduledOutputTrainingSampler
tfa.seq2seq.ScheduledOutputTrainingSampler(
sampling_probability: tfa.types.TensorLike,
time_major: bool = False,
seed: Optional[int] = None,
next_inputs_fn: Optional[Callable] = None
)
For more information on the library find here

Implementing dynamic_bidirectional_rnn to tf.layer.Bidirectional

I have used BasicLSTM, MulttiRNNCell, bidirectional_dynamic_rnn in a code for testing the code for proof of concept, which was a success but now for publishing code for the production level I need to update this bidirectional layers to fit for the upcoming version of tensorflow version 2.0.
for now, tensorflow show these layers are depreciated and will be removed in future versiontf2.0 particularly for this libraries the instruction for update was to use keras.layers.StackedRNNCells which is not working for me.
cells = [tf.contrib.rnn.LSTMCell(num_units=numHidden, state_is_tuple=True) for _ in range(2)] # 2 layers
stacked = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True)
((fw, bw), _)=tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked,cell_bw=stacked, inputs=rnnIn3d, dtype=rnnIn3d.dtype)