Upsampling Tensor for TensorRT - tensorflow

The tensorflow model is converted to TensorRT and Tensorflow's ResizeArea(upsample in the picture) need to implement plugin.
So ResizeArea is implemented in CUDA.
My TensorRT input is NCHW format.
uff_path = model_to_uff(model_path)
parser.register_input(ModelData.INPUT_NAME, (3, height, width), trt.UffInputOrder.NCHW)
parser.register_output(ModelData.OUTPUT_NAME)
parser.parse(uff_path, network)
So my CUDA code is implemented NCHW resampling.
I like to make sure my resampling format is correct.
Method_1
NCHW resizearea(4 times upsampling) sample.
channel_1 channel_2 channel_3
3,1,2,0, 0,4,3,1, 2,0,2,3,
3,0,1,2, 0,1,2,1, 2,0,4,2,
4,1,2,2, 1,3,2,4, 2,3,4,2,
channel_1 channel_2 channel_3
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
Each pixel is 4 times upsampled(for example, first pixel 3 is upsample 4 times horizontally and vertically). That is considered as NCHW format upsampling.
Method_2
Another way implemented is considered upsampling in NHWC format.
3-channel data (83,86,77) is upsampled horizontally and vertically.
Is Method_1 is correct way of NCHW upsampling?

It seems method 1 is ok, since TensorRT expects CHW as said in the docs, NHWC is a TF format, are you considering AlignCorners in your plugin layer? Also note that the resizing is nearest neighbor, in pytorch I used onnx-trt to do bilinear interpolation, which gave better results (in the case of segmentation, maybe for your case nn is ok).

After taking some time. The issue was solved. TensorRT works in NCHW format. Tensorflow model is in NHWC format. So in my plugin, need to work upsampling in NCHW format but output needs to change as NHWC format so that it can interface to next Tensorflow operations.

Related

Channels dimension index in the input shape while porting Pytorch models to Tensorflow

One of the major problems I've encountered when converting PyTorch models to TensorFlow through ONNX, is slowness, which appears to be related to the input shape, even though I was able to get bit-exact outputs with the two frameworks.
While the PyTorch input shape is B,C,H,W, the Tensorflow input shape is B,H,W,C, where B,C,H,W stand for batch size, channels, height and width, respectively. Technically, I solve the input shape problem easily when working in Tensorflow, using two calls to np.swapaxes:
# Single image, no batch size here yet
image = np.swapaxes(image, 0, 2) # Swapping C and H dimensions - result: C,W,H
image = np.swapaxes(image, 1, 2) # Swapping H and W dimensions - result: C,H,W (like Pytorch)
The slowness problem seems to be related to the differences in the ways the convolutional operations are implemented in PyTorch vs Tensorflow. While PyTorch expects channels first, Tensorflow expects channels last.
As a result, when I visualize the models using Netron, the ONNX model looks abstract and making sense (first image), whereas the Tensorflow .pb formatted model looks like a big mess (second image).
Note: It appears that this problem has already concerned the writers of the onnx2keras library, which supports an experimental feature of changing the C,H,W ordering originated in Pytorch, into H,W,C.
Any idea how to overcome this limitation? Are there other options for more abstractly exporting PyTorch models into Tensorflow?
ONNX (from PyTorch) - you can see the straight flow and the residual blocks:
Tensorflow (imported from the ONNX model) - almost nothing looks like a series of predefined operations:

Channels first vs Channels last - what do these mean?

https://software.intel.com/en-us/forums/computer-vision/topic/785538
"The problem has been resolved. It's because the model I use uses channels_first as default for GPU training, while OPENVINO requires channels_last for TF models."
What do these mean?
How can I change them?
I cannot find any further references to this on the net.
Channels first means that in a specific tensor (consider a photo), you would have (Number_Of_Channels, Height , Width).
Channels last means channels are on the last position in a tensor(n-dimensional array).
Examples:
(3,360,720) --- Channels first
(360,720,3) --- Channels last
where 3 comes from RGB (coloured image).
TensorFlow has by default channels last setting in the configuration.
The issue comes from the fact that some obsolete now frameworks(such as Theano) had a channels-first approach; porting was a problem particularly for newbies.
The solution to your problem would be to re-train your model in "Channels_Last" format.
You can convert TF model with NCHW layout to IR by using --disable_nhwc_to_nchw with Model Optimizer.
NCHW - channel first
NHWC - channel last
N:batch_size, C:no.of.channels, H:input_img_height, W:input_img_width
by default MKLDNN-plugin uses NCHW data layout.

TensorRT weight order compare to Tensorflow [H,W,IN_C,OUT_C]

I have to manually add a convolution layer as I have some special operation which is not supported by the Tensorflow parser. What is the order of the weights does the TensorRT expect to read from a .wts file? For example, a conv2d weight tensor of Tensorflow typically has an order of[H,W,IN_CHANNEL,OUT_CHANNEL]. I know that TensorRT expects the input data to be in the NCHW order, but is the order of weights has to be changed too when it's being writed to .wts file? If so, what is the order TensorRT expected to get? [IN_CHANNEL, OUT_CHANNEL,H,W]?
Quick summary ... if you are asking about weights sort order, you may also be concerned with input data order as well. The answer posted here probably gets you most of what you need on both counts: Run Tensorflow with NVIDIA TensorRT Inference Engine
Additional details … I recently worked through these issues using custom tools, and here are the relevant factors I encountered:
input image data order, which is NHWC for tensorflow and NCHW for tensorrt; and within the channel in the image, the color order, e.g. RGB vs BGR
weights sort orders by layer
for a 2D convolution, tensorflow uses RSCK ([filter_height, filter_width, input_depth, output_depth]) and tensorrt uses KCRS.
for a dense layer following a 2D convolution or pooling layer, adjust the weights sort order for a different flattening sequence, effectively converting RSCK for tensorflow to KCRS for tensorrt, where now R and S refer to the entire input layer height and width, respectively, the C is the input_depth as before, and now the output depth K is the neuron count of the dense layer
for a dense layer following dense layer, convert CK to KC order
(note: this answer assumes you are not using groups in any of the convolutions)

Input image of a fully quantized tensorflow lite model

I've trained a simple CNN model on Cifar-10 in tensorflow with fake quantization (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize). I then generated a .tflite file using toco. Now I want to use a python interpreter to test the tflite model.
Since I used tf.image.per_image_standardization to subtract mean and divide by variance during training. I need to do the same thing to the testing data right? But, the problem is, my model is already fully quantized by tflite, and it only takes uint8 data as inputs. To do image standardization, I need to convert my image to float32. So how do I convert it back to uint8, or is image standardization even necessary for the testing data in this case? Thanks.
So, it turns out I need to do standardization on the testing data for a good accuracy.
To do it, I directly feed uint8 input images to the tf.image.per_image_standardization function. The function would convert the uint8 data to float32, and then do standardization (subtract mean, divide by std). You can find source code of the function here: https://github.com/tensorflow/tensorflow/blob/r1.11/tensorflow/python/ops/image_ops_impl.py
Now, I have the standardized float32 input images. What I did is writing a quantization function to quantize the float32 images back to uint8. The math comes from this paper: https://arxiv.org/abs/1803.08607
Now, I have the standardized uint8 input images, I then use tflite interpreter python API to test the model. It works as expected.

Float ops found in quantized TensorFlow MobileNet model

As you can see in the screenshot of a quantized MobileNet model implemented in TensorFlow, there are still some float operations. The quantization is done in TensorFlow via the graph_transform tools.
The red ellipse in the image has its description in the right-hand-size text box. The "depthwise" is a "DepthwiseConv2dNative" operation that expects "DT_FLOAT" inputs.
Despite the lower Relu6 performs an 8-bit quantized operation, the result has to go through "(Relu6)" which is a "Dequantize" op, in order to produce "DT_FLOAT" inputs for the depthwise convolution.
Why is depthwise conv operations left out by TF graph_transform tools? Thank you.
Unfortunately there isn't a quantized version of depthwise conv in standard TensorFlow, so it falls back to the float implementation with conversions before and after. For a full eight-bit implementation of MobileNet, you'll need to look at TensorFlow Lite, which you can learn more about here:
https://www.tensorflow.org/mobile/tflite/