It appears that
model.layers[n].rate
can be viewed and changed, but does not reach the back end and actually change training behavior. What's the easiest way to change it for real? I'm hoping not to have to make a whole new model and transfer the weights.
The easiest way to achieve this would be:
Change the rates in the layers
model.layers[i].rate = 0.04 #layer[i] is the dropout layer
Clone this model to a new model using
model = keras.models.clone(model) #weights would be reinitialized
Compile the new model
model.compile(optimizer=..., loss=...) #optimizer state would be reset
Set the original weights to the new clone model
model.load_weights(file_weights) #load weights
Discussion on this problem can be found here.
Related
I have a CNN model and I want to save it and load it for prediction in different tab. But I am confused whether the model.evulotion part is included in the part I will save. And I don't know if it would be better to use Model.checkpoint or model.save to save and load. Is there anyone have an idea ? Thank you in advance
I'm in dilemma about using both of them so I've use it.
Using model.eval() just tells the PyTorch model to use mean values for batch normalisation, and deactivates the dropout layers. You can save your model without using model.eval() as it will not affect the performance.
While saving model, saving model's state dictionary is preferred. This can be done as shown:
#declare class of model here
model = NeuralNetwork()
#add training code below
...
#saving model, the model will be saved at the intermediateWeightPath location
intermediateWeightPath = "./bestmodel.pth"
torch.save(model.state_dict(), intermediateWeightPath)
I am fairly new to ML and am currently implementing a simple 3D CNN in python using tensorflow and keras. I want to optimize based on the AUC and would also like to use early stopping/save the best network in terms of AUC score. I have been using tensorflow's AUC function for this as shown below, and it works well for the training. However, the hdf5 file is not saved (despite the checkpoint save_best_only=True) and hence I cannot get the best weights for the evaluation.
Here are the relevant lines of code:
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(lr=lr),
metrics=[tf.keras.metrics.AUC()])
model.load_weights(path_weights)
filepath = mypath
check = tf.keras.callbacks.ModelCheckpoint(filepath, monitor=tf.keras.metrics.AUC(), save_best_only=True,
mode='auto')
earlyStopping = tf.keras.callbacks.EarlyStopping(monitor=tf.keras.metrics.AUC(), patience=hyperparams['pat'],mode='auto')
history = model.fit(X_trn, y_trn,
batch_size=bs,
epochs=n_epochs,
verbose=1,
callbacks=[check, earlyStopping],
validation_data=(X_val, y_val),
shuffle=True)
Interestingly, if I only change monitor='val_loss' in the early stopping and checkpoint (not the 'metrics' in model.compile), the hdf5 file is saved but obviously gives the best result in terms of validation loss. I have also tried using mode='max' but the problem is the same.
I would very much appreciate your advise, or any other constructive ideas how to work around this problem.
Turns out that even if you add a non-keyword metric, you still need to use its handle to refer to in when you want to monitor it. In your case you can do this:
auc = tf.keras.metrics.AUC() # instantiate it here to have a shorter handle
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(lr=lr),
metrics=[auc])
...
check = tf.keras.callbacks.ModelCheckpoint(filepath,
monitor='auc', # even use the generated handle for monitoring the training AUC
save_best_only=True,
mode='max') # determine better models according to "max" AUC.
if you want to monitor the validation AUC (which makes more sense), simply add val_ in the beginning of the handle:
check = tf.keras.callbacks.ModelCheckpoint(filepath,
monitor='val_auc', # validation AUC
save_best_only=True,
mode='max')
Another problem is that you ModelCheckpoint is saving the weights based on the minimum AUC instead of the max, which you want.
This can be changed by setting mode='max'.
What does mode='auto' do?
This setting essentially checks if the argument of monitor contains 'acc' and sets it to max. In any other case it sets uses mode='min', which is what is happening in your case.
You can confirm this here
The answer posted by Djib2011 should solve your problem. I just wanted to address the use of early stopping. Typically this is used to stop training when over fitting starts to cause the loss to increase. I think it is more effective to address the over fitting issue directly which should enable you to achieve a lower loss. You did not list your model so it is not clear how to address over fitting but some simple guidelines are as follows. If you havee several dense hidden layers at the top of the model delete most of them and just keep the final top dense layer. The more complex the model the more it is prone to over fitting. If that leads to lower training accuracy then keep the layers but add dropout layers. You might also try using regularization in the hidden dense layers. I also find it is beneficial to use the callback ReduceLROnPlateau. Set it up to monitor AUC and reduce the learning rate if it fails to improve.
I am using transfer learning and keras.applications.InceptionV3. I manage to train the model successfully.
However, when I want to generate "activation maximisation" images (e.g. the input image that maximizes the activation of one of the custom classes, ref eg https://arxiv.org/pdf/1512.02017v3.pdf ) I struggle to use the pre-trained model since I do manage to use it in "fit" mode and disable all dropouts etc.
What I do is that I combine the pre-trained model in a tf.keras.Sequential to do gradient descent on the weights of the first layer (the input image).
Despite setting base_model.trainable = False however it seems as if the pre-trained model is put into training mode (although weights are not updated) when using model.fit(data) on the outer sequential model.
Is there any way to force the base_model (a child of a Sequential) to be in "predict" mode when calling fit on the outer?
I just came across the same question. After reading some documentation and having a look on the source code of TensorFlows implementations of tf.keras.layers.Layer, tf.keras.layers.Dense, and tf.keras.layers.BatchNormalization I got the following understanding.
If training = False is passed on calling the layer, it will run in inference mode. This has nothing to do with the attribute trainable, which means something different. It would probably lead to less misunderstanding, if they would have called it training_mode instead.
When doing Transfer Learning or Fine Tuning training = False should be passed on calling the base model itself. As far as I saw until now this will only affect layers like tf.keras.layers.Dropout and tf.keras.layers.BatchNormalization and will have not effect on the other layers.
Running in inference mode via training = False will result in tf.layers.Dropout not to apply the dropout rate at all.
As tf.layers.Dropout has no trainable weights, setting the attribute trainable = False will have no effect at all,
I was wondering why do we need to save the model while training ?
isn't enough to save it once at the beginning of the train and then only save the weights during the train ?
I mean , The model isn't changing during the train , why this boolean is need for ?
class ModelCheckpoint(Callback):
...
save_weights_only: if True, then only the model's weights will be saved.
...
Thanks !
Its not a need or requirement, its just convenience. In a typical DL/DS workflow, you train a lot of models with different configurations and it is quite easy to get lost. Maybe you now have saved the weights for the best model but you don't remember which model configuration it was used. That information is not part of the weights and has to be recorded separately.
Then Keras provides a simple solution, to store the mode (which takes less than 10 KB) along with the weights, so in the case that you lose the original model configuration, it is still saved in the same HDF5 file.
Also consider the case where you send the model weights to someone else without the model configuration, how can you load the weights without a model? Again its just convenience.
I want to train a model to classify 90K labels, so I used the so called incremental training.
I initially train the model to classify only 1K labels, then add another 1K labels and expand the final FC layer's output dimension to 2K, and train for some more epochs. After that I add another 1K labels, and so on...
Note that it is NOT fine-tune, in which ALL parameters before the last FC are fixed, so I can cache the output features. In my case I need to update all variables in every stage.
The solution I designed is:
train for 1K labels.
save the model.
modify the graph to let the last FC layer output 2K dimension.
initialize all variables
load the previous checkpoint, which will override all parameters, but the last layer's weights.
train again and repeat
So the key point here is to realize partial restore checkpoints.
In TensorFlow, I use such code to load a checkpoint:
saver.restore(sess, "model.ckpt")
However, it fails when there is shape mismatch.
Could anyone help, either in how to partially restore/initialize variables, or how to implement incremental training in another way?
This is currently not simple to do. We are actively adding new APIs to make it easier.
In the meantime, if you are really determined, :), you can try the following when you change the FC layer's size:
Create a reader:
reader = tf.train.NewCheckpointReader(your_checkpoint_file)
Load all the variables in the checkpoint file:
cur_vars = reader.get_variable_to_shape_map().keys()
Remove the original FC layer:
cur_vars_without_fc = cur_vars - your_fc_layer_var_name
Create a saver with these variables:
saver = tf.Saver(cur_vars_without_fc)
saver.restore(sess, your_checkpoint_file)
Initialize your new FC layer's variables:
sess.run([your_fc_layer_var.initializer])
Hope that helps!
Sherry