convert resnet implementation from caffe to tensorflow - tensorflow

I want to implement resnet 50 from scratch
it is implemented in caffe by author of original paper,but i want tensorflow implementation
due to this repository :https://github.com/KaimingHe/deep-residual-networks
and therefor this image : http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
I know every equivalent (in tensorflow),but i dont lknow the meaning of scale in place,after batch normalization,can you explain me the meaning and also "use globale state " parameter in batchnorm ?

An "in-place" layer in caffe simply hints caffe to save memory: instead of allocating memory for both input and output of the net, "in-place" layer overrides the input with the output of the layer.
Using global state in "BatchNorm" layer means using the mean/std computed during training and not updating these values any further. This is the "deployment" state of BN layer.

Related

Deploying tensorflow RNN models (other than keras LSTM) to a microcontroller without unrolling the network?

Goal
I want to compare different types of RNN tflite-micro models, built using tensorflow, on a microcontroller based on their accuracy, model size and inference time. I have also created my own custom RNN cell that I want to compare with the LSTM cell, GRU cell, and SimpleRNN cell. I create the tensorflow model using tf.keras.layers.RNN(Cell(...)).
Problem
I have successfully deployed a keras LSTM-RNN using tf.keras.layers.LSTM(...) but when I create the same model using tf.keras.layers.RNN(tf.keras.layers.LSTMCell(...)) and deploy it to the microcontroller, then it does not work. I trained both networks on a batch size of 64, but then I copy the weights and biases to a model where the batch_size is fixed to 1 as tflite-micro does not support dynamic batch sizes.
When the keras LSTM layer is converted to a tflite model it creates a fused operator called UnidirectionalSequenceLSTM but the network created with an RNN layer using the LSTMCell does not have that UnidirectionalSequenceLSTM operator, instead it has a reshape and while operator. The first network has only 1 subgraph but the second has 3 subgraphs.
When I run that second model on the microcontroller, two things go wrong:
the interpreter returns the same result for different inputs
the interpreter fails on some inputs reporting an error with the while loop saying that int32 is not supported (which is in the while operator, and can't be quantized to int8)
LSTM tflite-model vizualized with Netron
RNN(LSTMCell) tflite-model vizualized with Netron
Bad solution (10x model size)
I figured out that by unrolling the second network I can successfully deploy it and get correct results on the microcontroller. However, that increases the model size 10x which is really bad as we are trying to deploy the model on a resource constrained device.
Better solution?
I have explained the problem using the example of the LSTM layer (works) and LSTM cell in an RNN layer (does not work), but I want to be able to deploy a model using the GRU cell, SimpleRNN cell, and of course the custom cell that I have created. And all those have the same problem as the network created with the LSTM cell.
What can I do?
Do I have to create a special fused operator? Maybe even one for each cell I want to compare? How would I do that?
Can I use the interface into the conversion infrastructure for user-defined RNN implementations mentioned here: https://www.tensorflow.org/lite/models/convert/rnn. How I understand the documentation, is that this would only work for user-defined LSTM implementations, not user-defined RNN implemenations like the title suggests.

Set batch size of trained keras model to 1

I am having a keras model trained on my own dataset. However after loading weights the summary shows None as the first dimension(the batch size).
I want to know the process to fix the shape to batch size of 1, as it is compulsory for me to fix it so i can convert the model to tflite with GPU support.
What worked for me was to specify batch size to the Input layer, like this:
input = layers.Input(shape=input_shape, batch_size=1, dtype='float32', name='images')
This then carried through the rest of the layers.
The bad news is that despite this "fix" the tfl runtime still complains about dynamic tensors. I get these non-fatal errors in logcat when it runs:
E/tflite: third_party/tensorflow/lite/core/subgraph.cc:801 tensor.data.raw != nullptr was not true.
E/tflite: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#26 is a dynamic-sized tensor).
E/tflite: Ignoring failed application of the default TensorFlow Lite delegate indexed at 0.
The good news is that despite these errors it seems to be using the GPU anyway, based on performance testing.
I'm using:
tensorflow-lite-support:0.2.0'
tensorflow-lite-metadata:0.2.1'
tensorflow-lite:2.6.0'
tensorflow:tensorflow-lite-gpu:2.3.0'
Hopefully, they'll fix the runtime so it doesn't matter whether the batch size is 'None'. It shouldn't matter for doing inference.

Tensorflow lite micro neural network layers building

I try to run some ML on my ESP32, and I want to use Tensorflow lite micro. But I don't really understand, how they build up the layers. So here is the example how to train the person detection model:
Person detection model training
It is clear, but at the end they say:
MobileNet v1 is a stack of 14 of these depthwise separable convolution layers with an average pool, then a fully-connected layer followed by a softmax at the end.
If I check the sample code, where they build up the tf lite micro model, it only has 3 lines:
static tflite::MicroMutableOpResolver<3> micro_op_resolver;
micro_op_resolver.AddAveragePool2D();
micro_op_resolver.AddConv2D();
micro_op_resolver.AddDepthwiseConv2D();
There is the Average pool, and the depthwise layer, but where the Conv2D layer comes from? And only 1 depthwise layer is presented, but in the documentation, there is 14 depthwise layers in the model.
So the question is, is there any relation between training model, and the model I should build in tensoflow lite micro? If there is, how can I determine how to build up. And that is the question if there is no relation, in what way I need to build up the model.
They don't explicitly build the model, they rely on a model file that contains the architecture (source):
model = tflite::GetModel(g_person_detect_model_data);
Where g_person_detect_model_data.cc is generated from a tflite model (containing the architecture) with the following command (See Converting into a c source file in the Readme) :
# Install xxd if it is not available
!apt-get -qq install xxd
# Save the file as a C source file
!xxd -i vww_96_grayscale_quantized.tflite > person_detect_model_data.cc
So the code you shared doesn't build the model. What you see is that for performances reasons, they explicitly add the ops needed by the model instead of relying on the more complex tflite::AllOpsResolver.
It is indicated by this comment above the code you shared :
// Pull in only the operation implementations we need.
// This relies on a complete list of all the ops needed by this graph.
// An easier approach is to just use the AllOpsResolver, but this will
// incur some penalty in code space for op implementations that are not
// needed by this graph.
//
// tflite::AllOpsResolver resolver;
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::MicroMutableOpResolver<5> micro_op_resolver;
micro_op_resolver.AddAveragePool2D();
micro_op_resolver.AddConv2D();
micro_op_resolver.AddDepthwiseConv2D();
micro_op_resolver.AddReshape();
micro_op_resolver.AddSoftmax();

pruning tensorflow connections and weights (using cifar10 cnn)

I'm using tensorflow to run a cnn for image classification.
I use tensorflow cifar10 cnn implementation.(tensorflow cifar10)
I want to decrease the number of connections, meaning I want to prune the low-weight connections.
How can I create a new graph(subgraph) without some of the nuerones?
Tensorflow does not allow you lock/freeze a particular kernel of a particular layer, that I have found. The only I've found to do this is to use the tf.assign() function as shown in
How to freeze/lock weights of one Tensorflow variable (e.g., one CNN kernel of one layer
It's fairly cave-man but I've seen no other solution that works. Essentially, you have to .assign() the values every so often as you iterate through the data. Since this approach is so inelegant and brute-force, it's very slow. I do the .assign() every 100 batches.
Someone please post a better solution and soon!
The cifar10 model you point to, and for that matter, most models written in TensorFlow, do not model the weights (and hence, connections) of individual neurons directly in the computation graph. For instance, for fully connected layers, all the connections between the two layers, say, with M neurons in the layer below, and 'N' neurons in the layer above, are modeled by one MxN weight matrix. If you wanted to completely remove a neuron and all of its outgoing connections from the layer below, you can simply slice out a (M-1)xN matrix by removing the relevant row, and multiply it with the corresponding M-1 activations of the neurons.
Another way is add an addition mask to control the connections.
The first step involves adding mask and threshold variables to the
layers that need to undergo pruning. The variable mask is the same
shape as the layer's weight tensor and determines which of the weights
participate in the forward execution of the graph.
There is a pruning implementation under tensorflow/contrib/model_pruning to prune the model. Hope this can help you to prune model quickly.
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
I think google has an updated answer here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
Removing pruning nodes from the trained graph:
$ bazel build -c opt contrib/model_pruning:strip_pruning_vars
$ bazel-bin/contrib/model_pruning/strip_pruning_vars --checkpoint_path=/tmp/cifar10_train --output_node_names=softmax_linear/softmax_linear_2 --filename=cifar_pruned.pb
I suppose that cifar_pruned.pb will be smaller, since the pruned "or zero masked" variables are removed.

How to use pre-trained model as non trainable sub network in tensorflow?

I'd like to train a network that contains a sub network that I need to stay fix during the training. The basic idea is to prepend and append some layers the the pre-trained network (inceptionV3)
new_layers -> pre-trained and fixed sub-net (inceptionv3) -> new_layers
and run the training process for the task I have without changing the pre-trained one.
I also need to branch directly on some layer of the pre-trained network. For example, with the inceptionV3 I like to uses it from the conv 299x299 to the last pool layer or from the conv 79x79 to the last pool layer.
Whether or not a "layer" is trained is determined by whether the variables used in that layer get updated with gradients. If you are using the Optimizer interface to optimize your network, then you can simply not pass the variables used in the layers that you want to keep fixed to the minimize function, i.e.,
opt.minimize(loss, <subset of variables you want to train>)
If you are using tf.gradients function directly, then remove the variables that you want to keep fixed from the second argument to tf.gradients.
Now, how you "branch directly" to a layer of a pre-trained network depends on how that network is implemented. I would simply locate the tf.Conv2D call to the 299x299 layer you are talking about, and pass as its input, the output of your new layer, and on the output side, locate the 79x79 layer, use its output as the input to your new layer.