GluonCV object detection fine-tuning - Select which layers are modified (freeze the rest) - object-detection

I have a question about the procedure for fine-tuning a pre-trained object detection model with GluonCV, described in this tutorial.
As far as I understand, the described procedure modifies all the weight values in the model.
I wanted to only fine-tune the fully connected layer at the end of the network, and freeze the rest of the weights.
I assume that I should specify which parameters I want to modify when creating the Trainer:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001, 'wd': 0.0005, 'momentum': 0.9})
so, instead of net.collect_params(), I should list the parameters I’m interested in training, and run the rest of the process normally.
However, I don’t know how to isolate these parameters precisely…I tried printing:
params = net.collect_params()
but, out of this list, I don’t know which ones correspond to the final FC layers. Any suggestions?

Let's say we have a pretrained Gluon model for a classification task:
>>> import mxnet as mx
>>> net = mx.gluon.nn.HybridSequential()
>>> net.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2, activation='sigmoid'))
>>> net.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
>>> net.add(mx.gluon.nn.Flatten())
>>> net.add(mx.gluon.nn.Dense(units=10))
>>> net.collect_params()
hybridsequential0_ (
Parameter conv0_weight (shape=(6, 0, 5, 5), dtype=<class 'numpy.float32'>)
Parameter conv0_bias (shape=(6,), dtype=<class 'numpy.float32'>)
Parameter dense0_weight (shape=(1, 0), dtype=float32)
Parameter dense0_bias (shape=(1,), dtype=float32)
)
To fine-tune this convolutional network, we want to freeze all the blocks except Dense.
First, recall that collect_params method accepts a regexp string to choose specific block parameters by their names (or prefixes; prefix parameter of Conv2D, Dense, or any other Gluon (hybrid) block). By default, the prefixes are class names, i.e. if a block is Conv2D then the prefix is conv0_ or conv1_ etc. Moreover, collect_params returns an instance of mxnet.gluon.parameter.ParameterDict, which has setattr method.
Solution:
>>> conv_params = net.collect_params('(?!dense).*')
>>> conv_params.setattr('grad_req', 'null')
or simply
>>> net.collect_params('(?!dense).*').setattr('grad_req', 'null')
Here we exclude all the parameters matching dense to get only conv blocks and set their grad_req attributes to 'null'. Now, training the model net with mxnet.gluon.Trainer will update only dense parameters.
It is more convenient to have a pretrained model with separate attributes indicating specific blocks, e.g. the features block, anchor generators etc. In our case, we have a convolutional network that extracts features and passes them to an output block.
class ConvNet(mx.gluon.nn.HybridSequential):
def __init__(self, n_classes, params=None, prefix=None):
super().__init__(params=params, prefix=prefix)
self.features = mx.gluon.nn.HybridSequential()
self.features.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2,
activation='sigmoid'))
self.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
self.add(mx.gluon.nn.Flatten())
self.output = mx.gluon.nn.Dense(units=n_classes)
def hybrid_forward(self, F, x):
x = self.features(x)
return self.output(x)
With this convnet declaration, we don't have to use regexps to access required blocks:
>>> net = ConvNet(n_classes=10)
>>> net.features.collect_params().setattr('grad_req', 'null')
Gluon CV models follow exactly this pattern. See the documentation of the desired model and choose the blocks you would like to freeze. If the docs are empty, run
collect_params to see all the parameters and filter out with regexp the ones to fine-tune and set the returned parameters' grad_req to 'null'.

Related

Keras remove activation function of last layer

I want to use ResNet50 with Imagenet weights.
The last layer of ResNet50 is (from here)
x = layers.Dense(1000, activation='softmax', name='fc1000')(x)
I need to keep the weights of this layer but remove the softmax function.
I want to manually change it so my last layer looks like this
x = layers.Dense(1000, name='fc1000')(x)
but the weights stay the same.
Currently I call my net like this
resnet = Sequential([
Input(shape(224,224,3)),
ResNet50(weights='imagenet', input_shape(224,224,3))
])
I need the Input layer because otherwise the model.compile says that placeholders aren't filled.
Generally there are two ways of achievieng this:
Quick way - supported functions:
To change the final layer's activation function, you can pass an argument classifier_activation.
So in order to get rid of activation all together, your module can be called like:
import tensorflow as tf
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg",
classifier_activation=None
)
])
This however, is not going to work if the you want a different function, that is not supported by Keras classifer_activation parameter (e. g. custom activation function).
To achieve this you can use the workaround solution:
Long way - copy the model's weights
This solution proposes copying the original model's weights onto your custom one. This approach works because apart from the activation function you are not chaning the model's architecture.
You need to:
1. Download original model.
2. Save it's weights.
3. Declare your modified version of the model (in your case, without the activation function).
4. Set the weights of the new model.
Below snippet explains this concept in more detail:
import tensorflow as tf
# 1. Download original resnet
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
)
])
# 2. Hold weights in memory:
imagenet_weights = resnet.get_weights()
# 3. Declare the model, but without softmax
resnet_no_softmax = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
include_top=False,
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
),
tf.keras.layers.Dense(1000, name='fc1000')
])
# 4. Pass the imagenet weights onto the second resnet
resnet_no_softmax.set_weights(imagenet_weights)
Hope this helps!

Getting keras LSTM layer to accept two inputs?

I'm working with padded sequences of maximum length 50. I have two types of sequence data:
1) A sequence, seq1, of integers (1-100) that correspond to event types (e.g. [3,6,3,1,45,45....3]
2) A sequence, seq2, of integers representing time, in minutes, from the last event in seq1. So the last element is zero, by definition. So for example [100, 96, 96, 45, 44, 12,... 0]. seq1 and seq2 are the same length, 50.
I'm trying to run the LSTM primarily on the event/seq1 data, but have the time/seq2 strongly influence the forget gate within the LSTM. The reason for this is I want the LSTM to tend to really penalize older events and be more likely to forget them. I was thinking about multiplying the forget weight by the inverse of the current value of the time/seq2 sequence. Or maybe (1/seq2_element + 1), to handle cases where it's zero minutes.
I see in the keras code (LSTMCell class) where the change would have to be:
f = self.recurrent_activation(x_f + K.dot(h_tm1_f,self.recurrent_kernel_f))
So I need to modify keras' LSTM code to accept multiple inputs. As an initial test, within the LSTMCell class, I changed the call function to look like this:
def call(self, inputs, states, training=None):
time_input = inputs[1]
inputs = inputs[0]
So that it can handle two inputs given as a list.
When I try running the model with the Functional API:
# Input 1: event type sequences
# Take the event integer sequences, run them through an embedding layer to get float vectors, then run through LSTM
main_input = Input(shape =(max_seq_length,), dtype = 'int32', name = 'main_input')
x = Embedding(output_dim = embedding_length, input_dim = num_unique_event_symbols, input_length = max_seq_length, mask_zero=True)(main_input)
## Input 2: time vectors
auxiliary_input = Input(shape=(max_seq_length,1), dtype='float32', name='aux_input')
m = Masking(mask_value = 99999999.0)(auxiliary_input)
lstm_out = LSTM(32)(x, time_vector = m)
# Auxiliary loss here from first input
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
# An abitrary number of dense, hidden layers here
x = Dense(64, activation='relu')(lstm_out)
# The main output node
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
## Compile and fit the model
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'], loss_weights=[1., 0.2])
print(model.summary())
np.random.seed(21)
model.fit([train_X1, train_X2], [train_Y, train_Y], epochs=1, batch_size=200)
However, I get the following error:
An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 50, 1), ndim=3)]; however `cell.state_size` is (32, 32)
Any advice?
You can't pass a list of inputs to default recurrent layers in Keras. The input_spec is fixed and the recurrent code is implemented based on single tensor input also pointed out in the documentation, ie it doesn't magically iterate over 2 inputs of same timesteps and pass that to the cell. This is partly because of how the iterations are optimised and assumptions made if the network is unrolled etc.
If you like 2 inputs, you can pass constants (doc) to the cell which will pass the tensor as is. This is mainly to implement attention models in the future. So 1 input will iterate over timesteps while the other will not. If you really like 2 inputs to be iterated like a zip() in python, you will have to implement a custom layer.
I would like to throw in a different ideas here. They don't require you to modify the Keras code.
After the embedding layer of the event types, stack the embeddings with the elapsed time. The Keras function is keras.layers.Concatenate(axis=-1). Imagine this, a single even type is mapped to a n dimensional vector by the embedding layer. You just add the elapsed time as one more dimension after the embedding so that it becomes a n+1 vector.
Another idea, sort of related to your problem/question and may help here, is 1D convolution. The convolution can happen right after the concatenated embeddings. The intuition for applying convolution to event types and elapsed time is actually 1x1 convolution. In such a way that you linearly combine the two together and the parameters are trained. Note in terms of convolution, the dimensions of the vectors are called channels. Of course, you can also convolve more than 1 event at a step. Just try it. It may or may not help.

How can I use TensorFlow's sampled softmax loss function in a Keras model?

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:
import keras.backend as K
def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)
However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:
model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.
The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.
First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.
Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.
Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.
Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.
There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.
from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K
class SampledSoftmax(Layer):
def __init__(self, **kwargs):
super(SampledSoftmax, self).__init__(**kwargs)
def call(self, inputs):
"""
The first input should be the model as it were, and the second the
target (i.e., a repeat of the training data) to compute the labels
argument
"""
# the labels input to this function is batch size by 1, where the
# value at position (i, 1) is the index that is true (not zero)
# e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
biases=inputs[0]._keras_history[0].bias,
inputs=inputs[0],
labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
num_sampled=1000,
num_classes=200000)
def custom_loss(y_true, y_pred):
return K.tf.reduce_mean(y_pred)
num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))
dense = Dense(num_classes)
outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])
model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

Keras: difference of InputLayer and Input

I made a model using Keras with Tensorflow. I use Inputlayer with these lines of code:
img1 = tf.placeholder(tf.float32, shape=(None, img_width, img_heigh, img_ch))
first_input = InputLayer(input_tensor=img1, input_shape=(img_width, img_heigh, img_ch))
first_dense = Conv2D(16, 3, 3, activation='relu', border_mode='same', name='1st_conv1')(first_input)
But I get this error:
ValueError: Layer 1st_conv1 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.engine.topology.InputLayer'>. Full input: [<keras.engine.topology.InputLayer object at 0x00000000112170F0>]. All inputs to the layer should be tensors.
When I use Input like this, it works fine:
first_input = Input(tensor=img1, shape=(224, 224, 3), name='1st_input')
first_dense = Conv2D(16, 3, 3, activation='relu', border_mode='same', name='1st_conv1')(first_input)
What is the difference between Inputlayer and Input?
InputLayer is a layer.
Input is a tensor.
You can only call layers passing tensors to them.
The idea is:
outputTensor = SomeLayer(inputTensor)
So, only Input can be passed because it's a tensor.
Honestly, I have no idea about the reason for the existence of InputLayer. Maybe it's supposed to be used internally. I never used it, and it seems I'll never need it.
According to tensorflow website, "It is generally recommend to use the functional layer API via Input, (which creates an InputLayer) without directly using InputLayer."
Know more at this page here
Input: Used for creating a functional model
inp=tf.keras.Input(shape=[?,?,?])
x=layers.Conv2D(.....)(inp)
Input Layer: used for creating a sequential model
x=tf.keras.Sequential()
x.add(tf.keras.layers.InputLayer(shape=[?,?,?]))
And the other difference is that
When using InputLayer with the Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
That is in sequential model you can skip the InputLayer and specify the shape directly in the first layer.
i.e From this
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(4,)),
tf.keras.layers.Dense(8)])
To this
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, input_shape=(4,))])
To define it in simple words:
keras.layers.Input is used to instantiate a Keras Tensor. In this case, your data is probably not a tf tensor, maybe an np array.
On the other hand, keras.layers.InputLayer is a layer where your data is already defined as one of the tf tensor types, i.e., can be a ragged tensor or constant or other types.
I hope this helps!

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())