How to use pre-trained model as non trainable sub network in tensorflow? - tensorflow

I'd like to train a network that contains a sub network that I need to stay fix during the training. The basic idea is to prepend and append some layers the the pre-trained network (inceptionV3)
new_layers -> pre-trained and fixed sub-net (inceptionv3) -> new_layers
and run the training process for the task I have without changing the pre-trained one.
I also need to branch directly on some layer of the pre-trained network. For example, with the inceptionV3 I like to uses it from the conv 299x299 to the last pool layer or from the conv 79x79 to the last pool layer.

Whether or not a "layer" is trained is determined by whether the variables used in that layer get updated with gradients. If you are using the Optimizer interface to optimize your network, then you can simply not pass the variables used in the layers that you want to keep fixed to the minimize function, i.e.,
opt.minimize(loss, <subset of variables you want to train>)
If you are using tf.gradients function directly, then remove the variables that you want to keep fixed from the second argument to tf.gradients.
Now, how you "branch directly" to a layer of a pre-trained network depends on how that network is implemented. I would simply locate the tf.Conv2D call to the 299x299 layer you are talking about, and pass as its input, the output of your new layer, and on the output side, locate the 79x79 layer, use its output as the input to your new layer.

Related

Deploying tensorflow RNN models (other than keras LSTM) to a microcontroller without unrolling the network?

Goal
I want to compare different types of RNN tflite-micro models, built using tensorflow, on a microcontroller based on their accuracy, model size and inference time. I have also created my own custom RNN cell that I want to compare with the LSTM cell, GRU cell, and SimpleRNN cell. I create the tensorflow model using tf.keras.layers.RNN(Cell(...)).
Problem
I have successfully deployed a keras LSTM-RNN using tf.keras.layers.LSTM(...) but when I create the same model using tf.keras.layers.RNN(tf.keras.layers.LSTMCell(...)) and deploy it to the microcontroller, then it does not work. I trained both networks on a batch size of 64, but then I copy the weights and biases to a model where the batch_size is fixed to 1 as tflite-micro does not support dynamic batch sizes.
When the keras LSTM layer is converted to a tflite model it creates a fused operator called UnidirectionalSequenceLSTM but the network created with an RNN layer using the LSTMCell does not have that UnidirectionalSequenceLSTM operator, instead it has a reshape and while operator. The first network has only 1 subgraph but the second has 3 subgraphs.
When I run that second model on the microcontroller, two things go wrong:
the interpreter returns the same result for different inputs
the interpreter fails on some inputs reporting an error with the while loop saying that int32 is not supported (which is in the while operator, and can't be quantized to int8)
LSTM tflite-model vizualized with Netron
RNN(LSTMCell) tflite-model vizualized with Netron
Bad solution (10x model size)
I figured out that by unrolling the second network I can successfully deploy it and get correct results on the microcontroller. However, that increases the model size 10x which is really bad as we are trying to deploy the model on a resource constrained device.
Better solution?
I have explained the problem using the example of the LSTM layer (works) and LSTM cell in an RNN layer (does not work), but I want to be able to deploy a model using the GRU cell, SimpleRNN cell, and of course the custom cell that I have created. And all those have the same problem as the network created with the LSTM cell.
What can I do?
Do I have to create a special fused operator? Maybe even one for each cell I want to compare? How would I do that?
Can I use the interface into the conversion infrastructure for user-defined RNN implementations mentioned here: https://www.tensorflow.org/lite/models/convert/rnn. How I understand the documentation, is that this would only work for user-defined LSTM implementations, not user-defined RNN implemenations like the title suggests.

Training Tensorflow only one object

Corresponding Tensorflow documentation I trained 3 objects and get result (It can recognize these objects). When I show other objects (not the 3 ones) it doesn't work correctly.
I want to train only one object (example: a cup) and recognize only this object. Is it possible to do via Tensorflow ?
Your question doesn't provide enough details, but as I can guess your trained the network with softmax activation and Categorical or SparseCategorical cross entropy loss. If my guess is right, such network always generates prediction to one of three classess, regardless to actual data, i.e. there is no option of "no-one".
In order to train network to recognize only one class of objects, make the only one output with only one channel and sigmoid activation. Use BinaryCrossEntropy loss to train your model for the specific object. Provide dataset that includes examples with this object and without it.

How to disable e.g. Dropout in tf.keras.Model to generate activation maximation images using transfer learning

I am using transfer learning and keras.applications.InceptionV3. I manage to train the model successfully.
However, when I want to generate "activation maximisation" images (e.g. the input image that maximizes the activation of one of the custom classes, ref eg https://arxiv.org/pdf/1512.02017v3.pdf ) I struggle to use the pre-trained model since I do manage to use it in "fit" mode and disable all dropouts etc.
What I do is that I combine the pre-trained model in a tf.keras.Sequential to do gradient descent on the weights of the first layer (the input image).
Despite setting base_model.trainable = False however it seems as if the pre-trained model is put into training mode (although weights are not updated) when using model.fit(data) on the outer sequential model.
Is there any way to force the base_model (a child of a Sequential) to be in "predict" mode when calling fit on the outer?
I just came across the same question. After reading some documentation and having a look on the source code of TensorFlows implementations of tf.keras.layers.Layer, tf.keras.layers.Dense, and tf.keras.layers.BatchNormalization I got the following understanding.
If training = False is passed on calling the layer, it will run in inference mode. This has nothing to do with the attribute trainable, which means something different. It would probably lead to less misunderstanding, if they would have called it training_mode instead.
When doing Transfer Learning or Fine Tuning training = False should be passed on calling the base model itself. As far as I saw until now this will only affect layers like tf.keras.layers.Dropout and tf.keras.layers.BatchNormalization and will have not effect on the other layers.
Running in inference mode via training = False will result in tf.layers.Dropout not to apply the dropout rate at all.
As tf.layers.Dropout has no trainable weights, setting the attribute trainable = False will have no effect at all,

Getting and editing gradient parameters in Caffe

Is it possible to get the gradients with respect to each layer in Caffe in CNNs, edit them and again apply the new gradients in the training process? If possible, using pycaffe interface.
For example in TensorFlow, it could be done by means of functions:
given_optimizer.compute_gradients(total_loss)
given_optimizer.apply_gradients(grads)
I'm not sure what you mean by "apply the new gradients in the training process", but you can access the gradients in the pycaffe interface:
import caffe
net = caffe.Net('/path/to/net.prototxt', '/path/to/weights.caffemodel', caffe.TEST)
# provide inputs to the net, do a pass so that meaningful data/gradients propagate to all the layers
net.forward_backward_all()
# once data/gradients are updated, you can access them
net.blobs['blob_name'].diff # access the gradient of blob 'blob_name'
net.layers[5].blobs[0].diff # access the gradient of the first parameter blob of the 6th layer
To map between layer names and layer indices, you can use this code:
list(net._layer_names).index('layer_name')
This will return the index of layer 'layer_name'.

pruning tensorflow connections and weights (using cifar10 cnn)

I'm using tensorflow to run a cnn for image classification.
I use tensorflow cifar10 cnn implementation.(tensorflow cifar10)
I want to decrease the number of connections, meaning I want to prune the low-weight connections.
How can I create a new graph(subgraph) without some of the nuerones?
Tensorflow does not allow you lock/freeze a particular kernel of a particular layer, that I have found. The only I've found to do this is to use the tf.assign() function as shown in
How to freeze/lock weights of one Tensorflow variable (e.g., one CNN kernel of one layer
It's fairly cave-man but I've seen no other solution that works. Essentially, you have to .assign() the values every so often as you iterate through the data. Since this approach is so inelegant and brute-force, it's very slow. I do the .assign() every 100 batches.
Someone please post a better solution and soon!
The cifar10 model you point to, and for that matter, most models written in TensorFlow, do not model the weights (and hence, connections) of individual neurons directly in the computation graph. For instance, for fully connected layers, all the connections between the two layers, say, with M neurons in the layer below, and 'N' neurons in the layer above, are modeled by one MxN weight matrix. If you wanted to completely remove a neuron and all of its outgoing connections from the layer below, you can simply slice out a (M-1)xN matrix by removing the relevant row, and multiply it with the corresponding M-1 activations of the neurons.
Another way is add an addition mask to control the connections.
The first step involves adding mask and threshold variables to the
layers that need to undergo pruning. The variable mask is the same
shape as the layer's weight tensor and determines which of the weights
participate in the forward execution of the graph.
There is a pruning implementation under tensorflow/contrib/model_pruning to prune the model. Hope this can help you to prune model quickly.
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
I think google has an updated answer here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
Removing pruning nodes from the trained graph:
$ bazel build -c opt contrib/model_pruning:strip_pruning_vars
$ bazel-bin/contrib/model_pruning/strip_pruning_vars --checkpoint_path=/tmp/cifar10_train --output_node_names=softmax_linear/softmax_linear_2 --filename=cifar_pruned.pb
I suppose that cifar_pruned.pb will be smaller, since the pruned "or zero masked" variables are removed.