Recently i have been working with tensorflow inception V3 and mobileNet to deploy them for use in Android. While converting retrained model of inception V3 to "tflite" there some issues as the "tflite" model was empty, But when tried with retrained MobileNet model it was successfully converted into "tflite". So basically i have two questions
Is it possible to convert inception V3 retrained model to "tflite"?
What is the difference between inception V3 and MobileNet?
PS. I have gone through the official documentation link, which only hinted at mobileNet only being
https://www.tensorflow.org/tutorials/image_retraining#other_model_architectures
Yes both of the models can be converted to tflite format. For a step by step procedure please go through this link Convert to tflite.
The major difference between InceptionV3 and Mobilenet is that Mobilenet uses
Depthwise separable convolution while Inception V3 uses standard convolution.
This results into lesser number of parameters in MobileNet compared to InceptionV3. However, this results in slight decrease in the performance as well.
In a standard convolution the filter operates on the M channels of the input image all-together and outputs N feature maps i.e. the matrix multiplication between the input and filter is multidimensional. To make it clear take the filter as a cube of size Dk x Dk x M, then in standard convolution each element of the cube will multiply with the corresponding element in the input feature matrix and finally after the multiplication the feature maps will be added to output N feature maps.
However, in a depthwise separable convolution the M single channel filters will operate on a single cube in the input feature and once the M filter outputs are obtained a pointwise filter of size 1 x 1 x M will operate on it to give N output feature maps. This can be understood from the figure below from the MobileNet paper.
To make it more clear please go through the DataScienceLink.
They have a concrete example on how it reduces the parameters count which I am simply pasting here.
4
Related
One of the major problems I've encountered when converting PyTorch models to TensorFlow through ONNX, is slowness, which appears to be related to the input shape, even though I was able to get bit-exact outputs with the two frameworks.
While the PyTorch input shape is B,C,H,W, the Tensorflow input shape is B,H,W,C, where B,C,H,W stand for batch size, channels, height and width, respectively. Technically, I solve the input shape problem easily when working in Tensorflow, using two calls to np.swapaxes:
# Single image, no batch size here yet
image = np.swapaxes(image, 0, 2) # Swapping C and H dimensions - result: C,W,H
image = np.swapaxes(image, 1, 2) # Swapping H and W dimensions - result: C,H,W (like Pytorch)
The slowness problem seems to be related to the differences in the ways the convolutional operations are implemented in PyTorch vs Tensorflow. While PyTorch expects channels first, Tensorflow expects channels last.
As a result, when I visualize the models using Netron, the ONNX model looks abstract and making sense (first image), whereas the Tensorflow .pb formatted model looks like a big mess (second image).
Note: It appears that this problem has already concerned the writers of the onnx2keras library, which supports an experimental feature of changing the C,H,W ordering originated in Pytorch, into H,W,C.
Any idea how to overcome this limitation? Are there other options for more abstractly exporting PyTorch models into Tensorflow?
ONNX (from PyTorch) - you can see the straight flow and the residual blocks:
Tensorflow (imported from the ONNX model) - almost nothing looks like a series of predefined operations:
I have to manually add a convolution layer as I have some special operation which is not supported by the Tensorflow parser. What is the order of the weights does the TensorRT expect to read from a .wts file? For example, a conv2d weight tensor of Tensorflow typically has an order of[H,W,IN_CHANNEL,OUT_CHANNEL]. I know that TensorRT expects the input data to be in the NCHW order, but is the order of weights has to be changed too when it's being writed to .wts file? If so, what is the order TensorRT expected to get? [IN_CHANNEL, OUT_CHANNEL,H,W]?
Quick summary ... if you are asking about weights sort order, you may also be concerned with input data order as well. The answer posted here probably gets you most of what you need on both counts: Run Tensorflow with NVIDIA TensorRT Inference Engine
Additional details … I recently worked through these issues using custom tools, and here are the relevant factors I encountered:
input image data order, which is NHWC for tensorflow and NCHW for tensorrt; and within the channel in the image, the color order, e.g. RGB vs BGR
weights sort orders by layer
for a 2D convolution, tensorflow uses RSCK ([filter_height, filter_width, input_depth, output_depth]) and tensorrt uses KCRS.
for a dense layer following a 2D convolution or pooling layer, adjust the weights sort order for a different flattening sequence, effectively converting RSCK for tensorflow to KCRS for tensorrt, where now R and S refer to the entire input layer height and width, respectively, the C is the input_depth as before, and now the output depth K is the neuron count of the dense layer
for a dense layer following dense layer, convert CK to KC order
(note: this answer assumes you are not using groups in any of the convolutions)
As we know, a DNN is comprised of many layers which consist of many neurons applying the same function to different parts of the input. Meanwhile, if we use Tensorflow to execute a DNN task, we will get a dataflow graph generated by Tensorflow automatically and we can use Tensorboard to visualize the dataflow graph as blow. But there is no neuron in the layer. So I wonder what is the relationship between Tensorflow dataflow graph and a DNN? When a neuron of DNN's layer map into dataflow graph, how is it represented?What is the relationship of neuron in DNN and node in tensorflow(representing an operation)? I just started to learn DNN and Tensorflow, please help me arrange thoughts in order. Thanks:) enter image description here
You have to differentiate between the metaphoric representation of a DNN and it's mathematic description. The math behind a classic neuron is the sum of the weighted inputs + a bias (usually calling a activation function on this result)
So in this case you have an input vector mutplied by a weight vector (containing trainable variables) and then summed up with a bias scalar (also trainable)
If you now consider a layer of neurons instead of one, the weights will become a matrix and the bias a vector. So calculating a feed forward layer is nothing more then a matrix multiplication follow by a sum of vectors.
This is the operation you can see in your tensorflow graph.
You can actually build your Neural Network this way without any use of the so called High Level API which use the Layer abstraction. (Many have done this in the early days of tensorflow)
The actual "magic", which tensorflow does for you is calculating and executing the derivatives of this foreword pass in order to calculate the updates for the weights.
i have the following task: i'm supposed to find the coordinates of an targetpoint. The features that are given, are the distances from anchors to that targetpoint. See img 1 distances from anchors to target
I planned to create a simple neural network first just with input and output layer. The cost-function i try to minimize is: correct_coordinate - mean of square(summed_up_distances*weights).
But now i'm kind of stuck in how to model the neural network, so that i'm outputting coordinates [x,y], as the current model would just output a single value. See img 2 current model
Right now I would than just train 2 neural networks. One that outputs the x-value, and one that outputs the y-value.
I'm just not sure if that is the best practice with tensorflow.
So I would like to know, how would you model the NN with tensorflow?
You can build the network with 2 nodes in the output layer. there is no need to train 2 neural networks for the same task.
I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.