TensorRT weight order compare to Tensorflow [H,W,IN_C,OUT_C] - tensorflow

I have to manually add a convolution layer as I have some special operation which is not supported by the Tensorflow parser. What is the order of the weights does the TensorRT expect to read from a .wts file? For example, a conv2d weight tensor of Tensorflow typically has an order of[H,W,IN_CHANNEL,OUT_CHANNEL]. I know that TensorRT expects the input data to be in the NCHW order, but is the order of weights has to be changed too when it's being writed to .wts file? If so, what is the order TensorRT expected to get? [IN_CHANNEL, OUT_CHANNEL,H,W]?

Quick summary ... if you are asking about weights sort order, you may also be concerned with input data order as well. The answer posted here probably gets you most of what you need on both counts: Run Tensorflow with NVIDIA TensorRT Inference Engine
Additional details … I recently worked through these issues using custom tools, and here are the relevant factors I encountered:
input image data order, which is NHWC for tensorflow and NCHW for tensorrt; and within the channel in the image, the color order, e.g. RGB vs BGR
weights sort orders by layer
for a 2D convolution, tensorflow uses RSCK ([filter_height, filter_width, input_depth, output_depth]) and tensorrt uses KCRS.
for a dense layer following a 2D convolution or pooling layer, adjust the weights sort order for a different flattening sequence, effectively converting RSCK for tensorflow to KCRS for tensorrt, where now R and S refer to the entire input layer height and width, respectively, the C is the input_depth as before, and now the output depth K is the neuron count of the dense layer
for a dense layer following dense layer, convert CK to KC order
(note: this answer assumes you are not using groups in any of the convolutions)

Related

Convert 2D Convolutionary Neural Networks to 1D Convolutionary Neural Networks in Tensorflow

Say I have some feature extracted and it is 10x10 data(maybe image or cepstrogram).
Usually I would feed this into my 2DConv and i ll be on my way.
My quesiton is if I had to convert this into 1D of 100 inputs what disadvantages would I get besides the obvious part where my filter would not be detecting the surrounding neighboors but only the previous and the next ones to detect pattern, which might lead to a worse performance.
And If I had to do this though, would I just reshape ,use reshape layer or use permute layer ?
Thanks
Yes, you are correct regarding the GNA, our Intel GNA hardware is natively support only 1D convolution and 2D convolutions is experimental.
This article (GNA Plugin - OpenVINO™ Toolkit) specifies the steps to add Permute layers before or after convolutions.
You could try both methods and see which one works for you.
Generally,the 1d convolution in TensorFlow is created with 2d convolution wrapping in reshape layers to add H dimension before 2d convolution and remove it after that.
At the same time MO inserts permutes before and after reshape layers since they change the interpretation of data.
For advantages & disadvantages of 2D/1D CNN you may refer to this detailed thread
In TensorFlow, these are the process to build CNN architecture:
Reshape input if necessary using tf.reshape() to match the convolutional layer you intend to build (for example, if using a 2D convolution, reshape it into three-dimensional format)
Create a convolutional layer using tf.nn.conv1d(), tf.nn.conv2d(), or tf.nn.conv3d, depending on the dimensionality of the input.
Create a poling layer using tf.nn.maxpool()
Repeat steps 2 and 3 for additional convolution and pooling layers
Reshape output of convolution and pooling layers, flattening it to prepare for the fully connected layer
Create a fully connected layer using tf.matmul() function, add an activation using, for example, tf.nn.relu() and apply a dropout using tf.nn.dropout()
Create a final layer for class prediction, again using tf.matmul()
Store weights and biases using TensorFlow variables These are just the basic steps to create the CNN model, there are additional steps to define training and evaluation, execute the model and tune it
In step 2 of CNN development you create convolutional layer of 2D using tf.nn.conv2d() - this function Computes a 2-D convolution given 4-D input and filters tensors.
So if you have 1D vector as found in examples of MNIST datadet with 784 features, you can convert 1D vector to 4D input required for conv2d() function using the tensorflow reshape method, Reshape method converts to match picture format [Height x Width x Channel], then Tensor input become 4-D: [Batch Size, Height, Width, Channel]:
x = tf.reshape(x, shape=[-1, 28, 28, 1])
where x is placeholder vector
x = tf.placeholder(tf.float32, [None, num_input])
You may refer to the official Tensorflow documentation

Channels dimension index in the input shape while porting Pytorch models to Tensorflow

One of the major problems I've encountered when converting PyTorch models to TensorFlow through ONNX, is slowness, which appears to be related to the input shape, even though I was able to get bit-exact outputs with the two frameworks.
While the PyTorch input shape is B,C,H,W, the Tensorflow input shape is B,H,W,C, where B,C,H,W stand for batch size, channels, height and width, respectively. Technically, I solve the input shape problem easily when working in Tensorflow, using two calls to np.swapaxes:
# Single image, no batch size here yet
image = np.swapaxes(image, 0, 2) # Swapping C and H dimensions - result: C,W,H
image = np.swapaxes(image, 1, 2) # Swapping H and W dimensions - result: C,H,W (like Pytorch)
The slowness problem seems to be related to the differences in the ways the convolutional operations are implemented in PyTorch vs Tensorflow. While PyTorch expects channels first, Tensorflow expects channels last.
As a result, when I visualize the models using Netron, the ONNX model looks abstract and making sense (first image), whereas the Tensorflow .pb formatted model looks like a big mess (second image).
Note: It appears that this problem has already concerned the writers of the onnx2keras library, which supports an experimental feature of changing the C,H,W ordering originated in Pytorch, into H,W,C.
Any idea how to overcome this limitation? Are there other options for more abstractly exporting PyTorch models into Tensorflow?
ONNX (from PyTorch) - you can see the straight flow and the residual blocks:
Tensorflow (imported from the ONNX model) - almost nothing looks like a series of predefined operations:

Upsampling Tensor for TensorRT

The tensorflow model is converted to TensorRT and Tensorflow's ResizeArea(upsample in the picture) need to implement plugin.
So ResizeArea is implemented in CUDA.
My TensorRT input is NCHW format.
uff_path = model_to_uff(model_path)
parser.register_input(ModelData.INPUT_NAME, (3, height, width), trt.UffInputOrder.NCHW)
parser.register_output(ModelData.OUTPUT_NAME)
parser.parse(uff_path, network)
So my CUDA code is implemented NCHW resampling.
I like to make sure my resampling format is correct.
Method_1
NCHW resizearea(4 times upsampling) sample.
channel_1 channel_2 channel_3
3,1,2,0, 0,4,3,1, 2,0,2,3,
3,0,1,2, 0,1,2,1, 2,0,4,2,
4,1,2,2, 1,3,2,4, 2,3,4,2,
channel_1 channel_2 channel_3
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
Each pixel is 4 times upsampled(for example, first pixel 3 is upsample 4 times horizontally and vertically). That is considered as NCHW format upsampling.
Method_2
Another way implemented is considered upsampling in NHWC format.
3-channel data (83,86,77) is upsampled horizontally and vertically.
Is Method_1 is correct way of NCHW upsampling?
It seems method 1 is ok, since TensorRT expects CHW as said in the docs, NHWC is a TF format, are you considering AlignCorners in your plugin layer? Also note that the resizing is nearest neighbor, in pytorch I used onnx-trt to do bilinear interpolation, which gave better results (in the case of segmentation, maybe for your case nn is ok).
After taking some time. The issue was solved. TensorRT works in NCHW format. Tensorflow model is in NHWC format. So in my plugin, need to work upsampling in NCHW format but output needs to change as NHWC format so that it can interface to next Tensorflow operations.

What is the difference between conv1d with kernel_size=1 and dense layer?

I am building a CNN with Conv1D layers, and it trains pretty well. I'm now looking into how to reduce the number of features before feeding it into a Dense layer at the end of the model, so I've been reducing the size of the Dense layer, but then I came across this article. The article talks about the effect of using a Conv2D filters with a kernel_size=(1,1) to reduce the number of features.
I was wondering what the difference is between using a Conv2D layer with kernel_size=(1,1) tf.keras.layers.Conv2D(filters=n,kernel_size=(1,1)) and using a Dense layer of the same size tf.keras.layers.Dense(units=n)? From my perspective (I'm relatively new to neural nets), a filter with kernel_size=(1,1) is a single number, which is essentially equivalent to weight in a Dense layer, and both layers have biases, so are they equivalent, or am I misunderstanding something? And if my understanding is correct, in my case where I am using Conv1D layers, not Conv2D layers, does that change anything? As in is tf.keras.layers.Conv1D(filters=n, kernel_size=1) equivalent to tf.keras.layers.Dense(units=n)?
Please let me know if you need anything from me to clarify the question. I'm mostly curious about if Conv1D layers with kernel_size=1 and Conv2D layers with kernel_size=(1,1) behave differently than Dense layers.
Yes, since Dense layer is applied on the last dimension of its input (see this answer), Dense(units=N) and Conv1D(filters=N, kernel_size=1) (or Dense(units=N) and Conv2D(filters=N, kernel_size=1)) are basically equivalent to each other both in terms of connections and number of trainable parameters.
In 1D CNN, the kernel moves in 1 direction. The input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series Data, Natural Language Processing tasks etc. Definitely gonna see people using it in Kaggle NLP competitions and notebooks.
In 2D CNN, the kernel moves in 2 directions. The input and output data of 2D CNN is 3 dimensional. Mostly used on Image data.
Definitely gonna see people using it in Kaggle CNN Image Processing competitions and notebooks
In 3D CNN, the kernel moves in 3 directions. The input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans). Haven't personally seen applied version in competitions

What is the difference between tensorflow inception and mobilenet

Recently i have been working with tensorflow inception V3 and mobileNet to deploy them for use in Android. While converting retrained model of inception V3 to "tflite" there some issues as the "tflite" model was empty, But when tried with retrained MobileNet model it was successfully converted into "tflite". So basically i have two questions
Is it possible to convert inception V3 retrained model to "tflite"?
What is the difference between inception V3 and MobileNet?
PS. I have gone through the official documentation link, which only hinted at mobileNet only being
https://www.tensorflow.org/tutorials/image_retraining#other_model_architectures
Yes both of the models can be converted to tflite format. For a step by step procedure please go through this link Convert to tflite.
The major difference between InceptionV3 and Mobilenet is that Mobilenet uses
Depthwise separable convolution while Inception V3 uses standard convolution.
This results into lesser number of parameters in MobileNet compared to InceptionV3. However, this results in slight decrease in the performance as well.
In a standard convolution the filter operates on the M channels of the input image all-together and outputs N feature maps i.e. the matrix multiplication between the input and filter is multidimensional. To make it clear take the filter as a cube of size Dk x Dk x M, then in standard convolution each element of the cube will multiply with the corresponding element in the input feature matrix and finally after the multiplication the feature maps will be added to output N feature maps.
However, in a depthwise separable convolution the M single channel filters will operate on a single cube in the input feature and once the M filter outputs are obtained a pointwise filter of size 1 x 1 x M will operate on it to give N output feature maps. This can be understood from the figure below from the MobileNet paper.
To make it more clear please go through the DataScienceLink.
They have a concrete example on how it reduces the parameters count which I am simply pasting here.
4